Schedule and course plan

Period 3

Where and When	Activity	Reading	Examination
February 4 13.15-15.00 B1	Lecture 1: Introduction, boolean retrieval, course practicalities Hedvig Kjellström, Johan Boye	Manning Chapter 1, 2
February 7 10.15-12.00 B1	Lecture 2: Term vocabulary, dictionaries and tolerant retrieval Johan Boye	Manning Chapter 2, 3
February 11 13.15-15.00 B1	Lecture 3: Evaluation of search engines Jussi Karlgren	Manning Chapter 8
February 18 13.00-19.00 Orange	Computer hall session		Oral examination of Assignment 1 in front of computer
February 21 10.15-12.00 V1 (note)	Lecture 4: Scoring, weighting, vector space model Hedvig Kjellström	Manning Chapter 6, 7
March 4 13.15-15.00 B1	Lecture 5: Retrieval of documents with hyperlinks Johan Boye, Hedvig Kjellström	Manning Chapter 21, Avrachenkov Sections 1-2
March 18 13.15-15.00 B1	Lecture 6: Evaluation II Jussi Karlgren	Manning Chapter 9, Robertson
March 18 15.00-19.00 Gul	Computer hall session		Oral examination of Assignment 2 in front of computer
March 21 10.15-12.00 B1	Lecture 8: Some useful additions to a search engine, Random Indexing Viggo Kann	Sahlgren
March 25 13.15-15.00 B1	Lecture 7: Probabilistic information retrieval, language models Hedvig Kjellström	Manning Chapter 11, 12

Period 4

Where and When	Activity	Reading	Examination
April 8 15.00-19.00 Orange	Computer hall session		Oral examination of Assignment 3 in front of computer
April 8 13.15-15.00 B1	Lecture 9: Guest lectures Anders Friberg, KTH Music Information Retrieval The recent paradigm shift in music distribution has created a need for new methods of browsing, searching, and recommending music on the Internet. Given the size of current music databases, typically around 10 million songs or more, automatic methods are particularly useful. An overview of methods and challenges in the field with some snapshots from KTH research will be presented. Hercules Dalianis, SU Clinical text retrieval - some methods and some applications Electronic patient records contain a waste source of information, both in form of structured information as diagnosis codes, drug codes, lab values, time stamps, etc and unstructured in form of free text. Methods - both rule based and machine learning based for retrieving this information is presented. Applications as diagnosis codes assignment, hospital acquired infection detection and adverse drug event detection will be discussed.
April 15 13.15-15.00 B1	Lecture 10: Guest lectures Simon Stenström and Martin Nycander, Findwise Search solutions from the Trenches The presentation will describe the difference between a search index and a search solution and the process of building a search solution. We will show real world examples that explains some of the problems that we encounter when working with different parts of search solutions. We will discuss source data, what you should index, how you create a real search query and what you can create from a backing search engine. Magnus Rosell, FOI Text Clustering Exploration Text clustering can be used to explore the contents of an unknown text set. Presentation of text clusters so that humans can grasp them is very important. If the texts are associated with further information which clusters are potentially more interesting can be decided automatically.
April 22 13.15-15.00 B1	Lecture 11: Guest lecture Filip Radlinski, Microsoft Research Cambridge Evaluating Search Engines Without Human Judgments How can you tell how well a search engine is performing? Traditional search engine evaluation takes a sample of search queries, fetches results, and manually assesses them. This approach often works well, but also poses a number of challenges. In particular, it can be difficult to be sure that the experts assessing queries really know what the users were looking for with a particular query, in the context of a particular time and place. In the competitive space on commercial search, these challenges can become huge. This lecture will consider some of the most difficult aspects of manually assessing search engine results, and contrast it with an alternative: observing the behavior of actual users. Search engine logs can record a great deal of user interaction data, and we will see some of the many ways that these interactions can be interpreted as feedback on search quality. The talk will consider the challenges and assumptions that different forms of online evaluation make, as well as methods that can be used to design online metrics that reflect user satisfaction most accurately and most efficiently.
April 29 13.15-15.00 B1	Lecture 12: Guest lecture CANCELLED
May 16 09.00-13.00 Fantum, Lindstedtsv 24, floor 5	Project presentations		Written report hand-in Oral presentation in front of poster