# Projektförslag

Please email your preferred project options to dd143x.pawel@gmail.com by the 5th of February.

## Projekt idéer

**Title: Max Flow Algorithms**

** Theme:** Algorithms, Graph Theory

**Subject: **Finding maximum flows in graphs is a classical problem in algorithms and optimization. A survey on algorithms for maximum flow can be found here:

http://cacm.acm.org/magazines/2014/8/177011-efficient-maximum-flow-algorithms/fulltext

In recent years there have been several exciting developments in this area, with several new algorithms discovered. The idea of this project is to explore how well these new algorithms work in practice by implementing and comparing them to classic algorithms.

There are at least two possible directions/projects:

- In 2013, Orlin [1] gave an exact max-flow algorithm running in time

O(n*m), resolving a long-standing open problem. - In works by Cristiano et al. [2] and Lee et al. [3], fast algorithms

for computing *approximate* maximum flows, utilizing electrical flows

and linear algebra, were discovered.

**References:**

[1] Orlin, J.B. Max Flows in O(nm) time, or better. In Proceedings of

the Annual ACM Symposium on Theory of Computing. ACM Press, New York,

765–774.

[2] Christiano, P., Kelner, J.A., Madry, A., Spielman, D.A., and Teng,

S-H. Electrical flows, laplacian systems, and faster approximation of

maximum flow in undirected graphs. In Proceedings of the Annual ACM

Symposium on Theory of Computing. ACM Press, New York, 2011, 273–282.

[3] Lee, T., Rao, S., and Srivastava, N. A new approach to computing

maximum flows using electrical flows. In Proceedings of the Annual ACM

Symposium on Theory of Computing. ACM Press, New York, 2013, 755–764.

**Title:** **Faktorisera heltal med SAT-lösare**

**Subject:** Att faktorisera stora heltal är ett okänt beräkningsproblem som antas vara svårt att lösa effektivt, ett antagande som mycket av modern kryptografi är baserat på. En möjlig ansats för att lösa faktorisering är att reducera till det (kanske ännu svårare) SAT-problemet och använda en modern SAT-lösare. En poäng med detta är att mycket energi har lagts ned på att konstruera optimerade SAT-lösare de senaste åren. Projektet går ut på att undersöka hur väl denna ansats står sig mot specialiserade faktoriseringsalgoritmer, både att undersöka vad tidigare forskning kommit fram till samt att genomföra egna experiment.

**Supervisor:** Per Austrin

**Title:** **Sentiment Classification in Social Media**

**Theme:** Natural language processing, Classification

**Subject:** The proliferation of social media has generated much interest amongst scientists and within industry - people's opinions are freely expressed and this carries value for many people. However, social media is a challenge to study as despite the availability of data, extracting interesting information is difficult due to many reasons including: short form communication, abbreviations, slang words, lack of capitals, sarcastic humour, and poor spelling, punctuation and grammar. One type of information of value is sentiment - the emotional payload of a unit of social media. Two prevailing approaches attempt to solve the challenge of sentiment classification: machine-learning and lexicon-based methods. This project will investigate these approaches with a view to comparing their performance using datasets that contain gold standard human judgements of sentiment.

**Supervisor:** Richard Glassey

**Title:** **Detecting Visual Plagiarism**

**Theme:** Computer vision, Classification

**Subject:** Textual plagiarism is routinely used within universities to provide evidence that students have committed plagiarism within their reports. However, visual artefacts within a report are often discarded during analysis. Clearly, building a visual corpus demands more resources than a textual corpus, but this topic raises the concern: if a picture paints 1000 words, should we not also be equally concerned about the plagiarism potential? As reports may combine several types of picture (photograph, diagram, graph) as well as deliberate attempts to obfuscate the true source of a picture (rotation, scaling, cropping), several complementary methods may be required to arrive at a solution that balances accuracy with reasonable performance.

**Supervisor:** Richard Glassey

**Title:** **Are They Paying Attention?**

**Theme**: Computer vision

**Subject:** A concern for a lecturer is whether students are paying attention during a lecture; various context switches can recapture attention, and knowing when to deploy them would be a benefit. However, this can be difficult to gauge for various reasons - large class sizes, low lighting conditions and a dependence upon technology to capture data in for analysis in the first place. One potential solution is to apply face detection in combination with image capture (stills or video). Now familiar in social media and photo editing applications, this project considers if and how face detection could be used to measure 'group attention' - that is, given a crowd of size n, what proportion of n is paying attention, and how stable this is across the duration of a lecture. Ideally, this project will investigate the limits of state of the art face-detection algorithms, whilst also considering the engineering challenges to deliver such a project.

**Supervisor:** Richard Glassey

** **

**Title**: **Brain signal pattern recognition**

**Theme: **Algorithms, Machine learning, Classification, Signals

**Subject: **Pattern recognition and machine learning have significantly advanced the field of biological data analysis leading in consequence to the development of effective diagnostic tools and supporting research efforts. The contribution of novel pattern recognition methods has been particularly appreciated in brain data mining as this new approach allows for exploratory search for spatio-temporal patterns in large quantities of high-dimensional nonstationary recordings of brain activity. The emerging trend is to combine machine learning techniques with brain-inspired computing algorithms to address increasingly demanding objectives of brain signal analysis in novel applications.

Below you can find a set of alternative projects (they can be treated individually or in combination).

**Possible essay projects:**

- Develop your own approach or build upon the existing approaches to a specific brain signal pattern recognition problem, e.g.
- electroencephalographic (EEG) signal classification for a brain-computer interface (BCI),
- EEG-based epileptic seizure prediction (identifying precursors in high-dimensional brain signal recordings)
- automated sleep scoring based on physiological signals including EEG

- Alternatively, select and compare a few existing state-of-the-art methods. Focus on selected aspects of a brain signal pattern recognition problem of your choice (handling signal, extracting patterns, classifying and interpreting brain signal correlates)
- Discuss key challenges, emerging trends and propose future applications for brain signal recognition methodology.

**Supervisor: **Pawel Herman

**Title**: **Computer-aided medical diagnostics**

**Theme: **Artificial intelligence, Classification, Machine learning, Algorithms

**Subject: **Computer-aided diagnosis has been extensively validated in various medical domains, ranging from biomedical image or signal analysis to expert systems facilitating the process of decision making in clinical settings. Although the usefulness of computational approaches to medical diagnostics is beyond any doubt, there is still a lot of room for improvement to enhance the sensitivity and specificity of algorithms. The diagnostic problems are particularly challenging given the complexity as well as diversity of disease symptoms and pathological manifestations. In the computational domain, a diagnostic problem can often be formulated as a classification or inference task in the presence of multiple sources of uncertain or noisy information. This pattern recognition framework lies at the heart of medical diagnostics projects proposed here.

Below you can find a set of alternative projects (they can be treated individually or in combination).

**Possible essay projects:**

- Define a diagnostic problem within the medical domain and examine the suitability of machine learning, connectionist (artificial network-based), statistical or soft computing methods to your problem.
- Survey the state-of-the-art in computational tools supporting classification of disease symptoms and comparatively examine the diagnostic performance of some of them on a wide range of available benchmark data sets. Define a measure for diagnostic performance.
- Discuss most recent trends in the field and address some of the urgent challenges for computer-assisted diagnostics in medicine.

** Supervisor: **Pawel Herman

** **

**Title:** **Brain-inspired (biomimetic) computing**

**Theme:** Algorithms, Connectionist systems, Networks

**Subject: **Current computational methods, algorithms and available software turn computers into machines particularly effective in bookkeeping, solving complex but well-defined problems, searching for specific patterns etc. However, today’s computers perform rather poorly in tasks where multi-modal perception allowing to identify complex undefined patterns in large data streams is needed, or common sense reasoning and handling ambiguity are required among others at the cost of precision or speed to effectively solve a range of real-world problems and meet growing demands for robustness, flexibility, adaptation as well as computational scalability (for example, big data challenge). Brain- or neural-inspired computing is an emerging field of research that aims to design such efficient algorithms based on the principles used by the nervous system with the brain in the first place to process generic information. In the family of neural network based (connectionist) systems the focus is on the biomimetic nature of the network architectures and learning mechanisms. Some of these connectionist methods devised in the realm of brain-inspired computing achieve human-competitive performance in recognition tasks.

In this project, students are first obliged to familarise themselves with the state-of-the-art methods in the emerging field of computational methods inspired by brain’s neural networks (connectionist approach) or more general cognitive architectures. Using this background information, students will be in a position to devise their own method or build upon the existing techniques to address a selected computational problem. A range of potential applications is wide as it involves among others problems with high-dimensional data (multivariate) having complex relationships, requiring explorative search for interesting multi-dimensional patterns with potentially hierarchical structure (low-level features that serve as building blocks for high-level data representations) and a possibility to perform a classification task or make inference. Some examples of broad applications are image analysis (or generally pattern recognition), speech recognition, data mining (e.g. medical, financial, industrial), high-dimensional time-series prediction etc. In the course of the project, special attention should be paid to the scale of the developed computational algorithm, implementation challenges, modularisation and, most importantly, functionality (robustness to noisy conditions, flexibility, effective learning from environment., capability to handle unsupervised or semi-supervised learning scenarios etc.)

**Supervisor:** Pawel Herman

** **

**Title:** **Automated scheduling, e.g. university timetabling**

**Theme:** Artificial intelligence, Machine learning, Algorithms, Optimisation

**Subject:** Planning is one of the key aspects of our private and professional life. Whereas planning our own daily activities is manageable, scheduling in large multi-agent systems with considerable amounts of resources to be allocated in time and space subject to multitude of constraints is a truly daunting task. In consequence, scheduling or timetabling as prime representatives of hard combinatorial problems have increasingly become addressed algorithmically with the use of computational power of today's computers. This computer-assisted practice in setting up timetables for courses, students and lecturers has also gained a lot of interest at universities around the world and still constitutes an active research topic.

In this project, students can address a scheduling problem of their own choice or they can use available university timetabling benchmark data and tailor it to the project's needs. An important aspect of such project would be to select or compare different algorithms for combinatorial optimisation, and define a multi-criterion optimisation objective. It could be an opportunity to test computational intelligence and machine learning methodology.

**Supervisor:** Pawel Herman

**Title:** **Stock forecasting, financial data mining**

**Theme:** Machine learning, Algorithms, Time-series prediction

**Subject:** Stock trading is one of the most common economic activities in the world. Stock prices are very dynamic and commonly undergo quick changes due to the intrinsic nature of the financial domain. From a computational perspective, intelligent trading can be formulated as a data (or more specifically, time-series) prediction problem that involves both known parameters and unknown factors. The overarching idea is to design an algorithm that provides accurate prediction thus allowing for making optimal trading decisions. In this domain, machine learning and oft computing methods have recently proven great potential.

**Possible essay projects:**

- Compare some of the most recent approaches to financial time series prediction and validate their performance on available benchmark data sets (e.g., http://www.stockhistoricaldata.com/download).
- Propose a method with your own novel component and verify its suitability for the problem of financial time series analysis on different benchmark data sets.
- Concentrate on one or two of the emerging challenges in computer-based stock forecasting and address them using selected methodology. Reflect on the statistical nature of the data that reflects complex characteristics of financial markets.

**Supervisor:** Pawel Herman

**Title:** **Intelligent control systems**

**Theme:** Machine learning, Artificial intelligence, Algorithms, Control, Soft computing

**Subject:** There is a clear trend for smarter machines that are able to collect data, learn, recognize objects, draw conclusions and perform behaviors to emerge in our daily life. Advanced intelligent control systems affect many aspects of human activities and can be found in a wide range of industries, e.g. healthcare, automotive, rail, energy, finance, urbanization and consumer electronics among others. By adapting and emulating certain aspects of biological intelligence this new generation of control approaches makes it possible for us to address newly emerging challenges and needs, build large-scale applications and integrate systems, implement complex solutions and meet growing demand for safety, security and energy efficiency.

**Possible essay projects:**

- Select a real-world control problem (traffic control, energy management, helicopter or ship steering, industrial plant control, financial decision support and many others) and propose a new approach using machine learning and soft computing methodology (computational intelligence) that enhances functionality, automatisation and robustness when compared to classical solutions.
- Demonstrate functional (and other) benefits of “computationally intelligent” control approaches in relation to the classical methodology in a range of low-scale control problems (benchmarks). Discuss a suitable framework of comparison and potential criteria.
- Consider a control robotic application with all constraints associated with autonomous agents and real-world environments (which can be emulated in software). Propose “computationally intelligent” methods to enable your robot agent prototype to robustly perform complex tasks (learn from the environment, evolve over time, find solutions to new emerging problems and adapt to new conditions among others).

**Supervisor:** Pawel Herman

**Title: **Web document classification

**Theme:** Pattern recognition, Machine learning, Algorithms, Statistics, Computational linguistics

**Subject:**

**Possible essay projects:**

**Supervisor:** Pawel Herman

**Title**: **Visualisation of neural data**

**Theme:** Visualisation / Simulations

**Subject**: Visualisation is one of the most neglected aspect of a rapidly developing field of computational biology. Only recently can we observe an emerging trend for combining neural simulation frameworks with visualisation software. Still there are a plethora of challenging problems that need to be urgently addressed (high-dimensional data, pre-processing, integration with a simulation software, demands for purely visual aspects, interactive environment) to render visualisation a practical tool in computational studies. This is envisaged to facilitate computational modelling and assist in demonstrating scientific findings.

**Possible essay projects:**

- Visualisation of existing data produced by models (different types of high-dimensional spatiotemporal data are available).
- Conceptual integration with simulating environment to help with data pre-processing (or post-procesing) and facilitate iteractive mode with the user.
- Review of the state-of-the-art methodology and a motivated choice of a tool for the computational problem at hand.

**Supervisor: **Pawel Herman

**Title**: **Optimisation and parameter search in computational modelling**

**Theme:** Algorithms

**Subject**: Model's parameters have a decisive effect on its behaviour and dynamics. Search for parameters is at the same time the most tedious component of computational modelling. Neural simulations are no exception. On the contrary since they account for nonlinear and stochastic effects in brain data, parameters need to be carefully tuned to obtain a desirable functional and/or dynamical outcome. This optimisation procedure is commonly carried out manually on a trial-and-error basis. It is thus desirable to automatise this tedious process by providing an effective parameter search and optimisation scheme. One of key challenges to address is computational efficiency of the implemented method and the definition of a cost function based on the existing "manual" evaluation criteria. Tests in the project will be perfomed with the use of existing neural models or a low-scale simulation demo will be developed.

**Possible essay projects:**

- Define the cost function that reflects the fundamental model evaluation criteria and propose an effective way of its calculation.
- Propose a computationally effective way of evaluating the cost function (p. 1)
- Review and propose a parameter search method (from the existing approaches) that match the specificity of computational modelling.

**Supervisor: **Pawel Herman

**Title**: **Multi-scale brain simulations**

**Subject**: Simulations of neural systems and the brain can be performed on different scales, e.g. we could try to simulate every single neuron as detailed as possible. We could also assume populations of neurons to be the basic computational units and neglect the dynamics of single neurons. Libraries exist to simulate neural systems at several of these scales (e.g. GENESIS, NEURON, Nest, Nexa). These can be called from C++ or Python and be run on desktop machines as well as on supercomputers.

**Possible essay projects:**

- Implement some basic models in available simulators.
- Compare the type of simulations they can perform – what is the “correct” scale to perform brain simulations?
- As these simulations get closer to mimic real brains, what are the implications for medicine and computing? Ethical concerns?

**Supervisor: **Pawel Herman

**Title: Visualization and classification of experimental neural data****Theme**:** Biology, Nervous System, Machine Learning****Subject:** In biological experiments electrical activity in the nervous tissue is often recorded in multiple places at the same time. Multielectrode recording provides great amount of data but poses a problem of identification and classification of signals. Major difficulty arises from variability of neural responses to standard stimulus which is worked around by collecting data from more repetitions. Intelligent data processing should allow identification of individual neurons and groups of neurons by their involvement in the neural response at different phases of the spatio-temporal patern.**Possible essay projects:**

- Interactive visualization of experimental data
- Automatic identification of individual neurons in the data poo
- lClassification of neurons by their role in the activity pattern

**Supervisor:** Alex Kozlov

**Title: Analysis and synthesis of neuronal morphology****Theme:** Cell Biology, Nervous System, Topology, Statistics, Machine Learning**Subject:** Neurons have extremely complicated shapes determined by their role and place in the nervous system. However despite great individual variability neurons are classified by morphological types. Rapid growth of public repositories of neuromorphological data allows development of mathematical methods of neuron classification.**Possible essay projects:**

- Characterization of neuron morphology.
- Mathematical synthesis of artificial neurons
- Automatic classification of morphological neuron types?

**Supervisor:** Alex Kozlov

**Title: Multi-objective optimization****Theme:** Applied Mathematics, Decision Making, Optimization**Subject**: In multi-objective optimization several goal functions are optimized simultaneously. Due to possible inherent conflicts the optimal solution does not exist and optimality is considered as a trade-off between objectives. This gives rise to interesting semi-heuristic approaches

and opens opportunities for experimentation.**Possible essay projects:**

- Inverse parameter estimation using multiobjective approach
- Mathematical synthesis of artificial neurons
- Evolutionary algorithms for solving multi-objective problems

**Supervisor:** Alex Kozlov

**Title:** **Structured prediction**

**Subject:** An exciting new direction in supervised Machine Learning is to learn functions to structures, not just single values. For example we might learn mappings from sequences (or bags) of words to trees. Structured learning is particularly promising in domains such as computational linguistics and computational biology. In this thesis you will explore the techniques in this area and evaluate results of various approaches over a domain of your choosing.

**Reference:**

Search-based structured prediction, Hal Daumé III, John Langford and Daniel Marcu. Machine Learning (2009) 75: 297–325

**Supervisor:** Michael Minock

**Title: Grounding Language in Perception**

Quoting Jeffery Siskind:

Suppose that you were a child. And suppose that you heard the utterance 'John walked to school'. And suppose that when hearing this utterance, you saw John walk to school. And suppose, following Jackendoff (1983), that upon seeing John walk to school, your perceptual faculty could produce the expression GO(John, TO(school)) to represent that event. And further suppose that you would entertain this expression as the meaning of the utterance that you just heard. At birth, you could not have known the meanings of the words 'John', 'walked', 'to', and 'school', for such information is specific to English. Yet, in the process of learning English, you come to possess a mental lexicon that maps the words 'John', 'walked', 'to', and 'school' to representations like 'John', 'GO(x,y)', 'TO(x)', and 'school', respectively. This paper explores one way that this might be done.

In this thesis, you are to replicate (some variant of) Siskind's experiment and comment on its suitability for practical application.

**Reference:**

A Computational Study of Cross-Situational Techniques for Learning Word-to-Meaning Mappings,' Cognition, 61(1-2):39-91, October/November 1996.

**Supervisor:** Michael Minock

**Title:** **Replication of Lambda-WASP**

**Subject: **Baby hears word sequences coupled with baby's pre-linguistic understandings of objects and events in the world. From this baby learns knowledge that lets baby map from word sequences to meaning expressions (natural language understanding) and map from meaning expressions to word sequences (natural language generation). Raymond Mooney's group looked at this problem for natural language understanding in their 2007 paper 'Learning Synchronous Grammars for Semantic Parsing with Lambda Calculus'. Attempt to replicate the results of this work and evaluate their claims. Time permitting, review additional work that has looked at this problem more recently. Critically discuss.

**Supervisor:** Michael Minock

**Title: Recursion in PostgreSQL. How much can it do? **

**Subject:** Recursion has been added to SQL in general and PostgreSQL in specific. But how powerful is it? How does it compare to the various extensions of Datalog (e.g. stratified negation)? How well does it perform? How well does recursion interact with regular view definitions? In this thesis you will deeply explore, document and evaluate recursion in SQL.

**Supervisor:** Michael Minock

**Title:** **Query Containment**

**Subject:** In a variety of settings (mobile caches, query optimizations, etc.) it is useful to be able to decide whether one query contains another. That is, over any possible state of the database, are the results of one query always a subset of the other? In this thesis you will survey query containment results for a given query language (e.g. fragments of SQL, or SPARQL, or XPATH, etc... ). You will implement containment algorithms and evaluate their effectiveness over benchmarks.

**Supervisor:** Michael Minock

**Title: Gamification of NLI Acquisition**

**Subject:** Several researchers have explored the possibility of using computer games to acquire knowledge to build Natural Language Interfaces (NLIs). In this thesis, review earlier attempts, build your own gamification prototype and evaluate.

**Supervisor:** Michael Minock

**Title:** **Natural Language Interface technology in Computer Games**

**Subject:** Early in the history of computer gaming there were many delightful text-based adventure games. In such games the user made their way through the game by typing in natural language commands. Have such interfaces been integrated into more modern games and, if so, how? After you survey developments, implement a small game using NLI techniques and evaluate it, or variations of it with a group of users. Are NLIs ‘fun’? What are future prospects or opportunities for NLIs in computer games?

**Supervisor:** Michael Minock

**Title:** **Replication of Precise**

**Subject:** The paper 'Towards a theory of natural language interfaces to databases' by Popescu, Etzioni and Kautz, Proc of IUI, 2003, developed a novel method to support natural language interfaces to databases. Over the graph of tokens representing the database, max-flow is computed over the tokens represented in the user's natural language query. From the resulting sub-graph an SQL expression is trivially generated. Crucially, the approach claims that it can identify 'semantically tractable' queries that can be answered with 100% confidence. The paper presented results over the GEO corpus. In this work, attempt to replicate their results and then critically evaluate their claims.

**Supervisor:** Michael Minock

**Title:** **Algorithms and Complexity for Video and Computer Games**

**Theme:** Algorithms & Complexity

**Subject and possible essay projects:** Pick your favourite video or computer game, formulate it as a computational problem, and show whether the algorithm is tractable; e.g. show that it can be solved in polynomial time, is NP-hard, is PSPACE-complete, or admits an approximation algorithm. For examples of solved problems, see [1].

**References:**

[1] Classic Nintendo Games are (Computationally) Hard, paper by Greg Aloupis, Erik D. Demaine, Alan Guo, and Giovanni Viglietta.

**Supervisor:** Danupon Nanongkai

**Title:** **Scheduling Modern Elevators**

**Theme:** Algorithms / Artificial Intelligent

**Subject:** Modern elevator are very different from the past; e.g. passengers can indicate their desired floors when they call the elevator, there may be many cars in one shaft, and a car can move both vertically and horizontally. We will explore the potentials of this new technology in serving passengers more efficiently and imagine how the future elevator should look like to achieve even more efficiencies.

**Possible essay projects: **Understand and compare the advantages and disadvantages of different technologies and scheduling algorithms; e.g. one can experimentally explore the efficiencies of different algorithms to schedule the cars, or formulate an algorithmic problem and theoretically study its tractability.

**Supervisor:** Danupon Nanongkai

**Title: Beyond the stable marriage problem**

**Theme:** Algorithms

**Subject: **The stable marriage problem is a well-known problem emerging from Economics and has several applications (some researchers received a Nobel prize for studying this problem [1]). The goal of the problem is to match to groups of people (e.g. men and women) so that there is no chance of cheating. While this problem is pretty well-understood, its variations arising from related applications are not. The goal of this project is to theoretically study these variations, e.g. popular matching [2] and stable matching with ties [3]. For more examples, see [4].

**Possible essay projects: **Formulate new problems or try to improve existing algorithms for previously studied problems. For these problems, come up with a few possible conjectures and study a few directions to prove these conjectures.

**References:**

[1] Stable matching: Theory, evidence, and practical design

[2] Popular matching, paper by Abraham, Irving, Kavitha, and Mehlhorn.

[3] An improved approximation algorithm for the stable marriage problem with one-sided ties, paper by Chien-Chung Huang and Telikepalli Kavitha

[4] 3rd International Workshop on Matching Under Preferences

**Supervisor:** Danupon Nanongkai

**Title:** **Mining Compressed Graphs**

**Theme:** Algorithms / Data Mining

**Subject: **Data mining algorithms on large graphs such as social networks and the web usually assume that the graph is represented in a standard form (i.e. by an adjacency list or matrix). However, big technology companies such as Google do not store graphs this way, but in a *compressed form.* The goal of this project is to explore how we can compress graphs and at the same time can mine them easily. See [1] for an example of research in this direction.

**Possible essay projects: **Experimentally and/or theoretically explore advantages of different ways to compress graphs. Study algorithms for mining specific graph properties, such as connectivity, expansion, and PageRank.

**References:**

[1] Speeding up Algorithms on Compressed Web Graphs, paper by Chinmay Karande, Kumar Chellapilla, and Reid Andersen.

**Supervisor:** Danupon Nanongkai

**Title:** **Mining Distributed Graphs**

**Theme:** Algorithms / Data Mining

**Subject: **Graphs nowadays are extremely huge and are usually store them across several computers. It is challenging to mine these distributed graphs, and many new paradigms to mine these graphs, e.g. Google’s MapReduce [1,2] and Pregel [3], have been invented. The goal of this project is to study these paradigms and create new (hopefully better) paradigms to deal with distributed graphs.

**Possible essay projects: **Experimentally and/or theoretically explore advantages of different paradigms for dealing with distributed graphs. Study algorithms for mining specific graph properties, such as connectivity, expansion, and PageRank.

**References:**

[1] MapReduce: simplified data processing on large clusters, paper by Dean and Ghemawat.

[2] A Model of Computation for MapReduce, paper by Karloff, Suri, and Vassilvitskii.

[3] Pregel: a system for large-scale graph processing, paper by Malewicz et al.

**Supervisor:** Danupon Nanongkai

**Title:** **Mining Evolving Graphs**

**Theme:** Algorithms / Data Mining

**Subject: **Graph data, such as social networks and the web, is constantly changing, and cannot be handled by standard graph algorithms that usually assume that there is no change. The goal of this project is to study different ways that graphs can change and explore algorithms for handling these changes efficiently. See [1-3] as starting points.

**Possible essay projects: **Come up with different models for evolving graphs (e.g. one edge may change at a time, one node may appear at one time, etc.) and experimentally and/or theoretically study algorithms for mining graph properties (e.g. connectivity, expansion, and PageRank) on these models.

**References:**

[1] Graph Evolution: Densification and Shrinking Diameters, paper by Leskovec, Kleinberg, and Faloutsos.

[2] Fully Dynamic Connectivity: Upper and Lower Bounds, survey by Italiano.

[3] Graph Stream Algorithms: A Survey, by McGregor.

**Supervisor:** Danupon Nanongkai

**Title:** **Ant Colonies and Bio-inspired computing**

**Theme:** Algorithms

**Subject: **An ant colony is an example of highly distributed and dynamic system where agents (ants) have to collaboratively achieve some goals such as searching for food. This project will study algorithmic problems inspired by ant colonies. See [1, 2] for an example. More generally, one can formulate and study any problem inspired by a biological system. See [3, 4] for an example of research in this direction.

**Possible essay projects: **Formulate a new problem inspired by ant colonies or a biological system. Experimentally or theoretically study this problem, e.g. compare different heuristics for solving this problem by running them on some data, or show that the problem can or cannot be solved in polynomial time.

**References:**

[1] Task Allocation in Ant Colonies, paper by Cornejo et al.

[2] Towards More Realistic ANTS, paper by Emek et al.

[3] 2nd Workshop on Biological Distributed Algorithms

[4] Bioinspired computation in combinatorial optimization: algorithms and their computational complexity, paper by Neumann and Witt

**Supervisor:** Danupon Nanongkai

**Title:** **Game Theory for Social Networks**

**Theme:** Algorithms / Game Theory

**Subject: **Social networks are new places where extremely many people can interact in a complex way. We want to understand people can resolve conflicts and make cooperations on social networks, and how we can take advantage and win competitions on them. See [1, 2] for examples of research in this direction.

**Possible essay projects: **Study a previously-studied game-theoretic problem or a new problem motivated from some applications (feel free to pick your favourite application). The study can be either experimentally or theoretically; i.e. you can compare different algorithms ran on some real social networks, or prove some theorems.

**References:**

[1] A Game-theoretic Model of Attention in Social Networks, paper by Goel and Ronaghi.

[2] Competitive contagion in networks, paper by Goyal and Kearn

**Supervisor:** Danupon Nanongkai

**Title: Numerical Algorithms for technical and scientific Computation**

#### Keywords: Numerical mathematics, simulations, algorithms

**Subject:** Numerical algorithms gain an ever increasing importance together with the wider use of simulations and numerical calculations in many fields. This starts with basic research in physics and chemistry, covers life sciences or material sciences, and does not end in the product development of many engineering sciences.

The efficient implementation of numerical algorithms on modern computer systems is at the heart of software development for scientific and technical applications and an interesting and challenging task for computer scientists. Knowledge about the implementation of basic algorithms is a prerequisite for the efficient solution of large and complex calculation projects later on.

#### Possible essay projects

The idea of the proposed projects is to dive into the implementation of numerical algorithms. This will include the exploration of the algorithm's behaviour with convenient software tools that have been tailored for mathematical problems like Matlab or some Python-based software packages. Finally, the algorithm will be implemented, analysed and optimised for modern processors using a compiled programming language, for example C or C++. Specific topics can be selected from different application fields and can be chosen according to the interests and the previous knowledge of the student:

- linear algebra, the solution of linear systems,
- calculus, the evaluation of functions, numerical differentiation and integration,
- the solution of differential equations,
- applications of the Fast Fourier Transformation (FFT),
- and many more

**Supervisor:** Michael Schliephake

**Title:** **Rubik's cube**

**Theme:** Algorithms

**Subject: **Implement and compare different solution methods for Rubik's cube. If the 3x3 cube is already too investigated by other authors, how about looking at the 5x5 cube, or other variants (such as the triangular "cube")? A solution would require an efficient way of representing states of the cube, as well as the possible state transformations.

**Supervisor:** Vahid Mosavat

**Title:** **Elevator control strategies**

**Theme:** Algorithms

**Subject: **A program controlling an elevator has to repeatedly decide which floor to go to, based on the requests of people inside the elevator, and the requests of people on different floors pressing the elevator button. Implement an elevator simulator and compare the performance of a number of elevator control strategies. An interesting twist to the problem is to implement a program that learns and improves its strategy through reinforcement learning.

**Supervisor:** Vahid Mosavat

**Title:** **Optimal Yatzee**

**Theme:** Algorithms

**Subject:** Construct a program that plays Yatzee well, preferably optimally. The program should, given the current result and the current values of the dice, determine which dice to save or, if all three throws have been used in this round, determine where to put the score. There are several variations of this problem. The simplest one is to always maximize the expected points you will get, and a probably more difficult one is to, given both your own and an opponents current score, maximize the likelyhood of winning, provided the opponent also aims to win. There are also variations depending on the exact rules used.

**Supervisor:** Vahid Mosavat

**Title:** **Sudoko solvers**

**Theme:** Algorithms

**Subject:** In principle Sudoku can be solved using a brute-force algorithm but it is more interesting to find an efficient sudoku solver. The task is to study and implement som sudoku solving algorithms, and hopefully to design your own solution or variation. The algorithms should be tested and bench marked against sudoku instances of varying difficulty.

**Supervisor:** Vahid Mosavat

**Title:** **Graph properties of the KTH web** (1 thesis)

**Theme:** Algorithms/Graphs

**Subject:** Graphs have emerged as a powerful tool to study complex systems. The graph of a system describes an interaction between a pair of nodes as an edge, which may have a weight, and a direction. The ‘adjacency matrix’ described all the possible pair-wise integrations between the nodes. Such graphical representations can give deep insights about the robustness and dynamics properties of a system (Newman 2003, ). A typical example of a system that can be studied using graph theory is the world-wide-web. Previously, several studies (e.g. Barabasi 2007, Barabasi et al. 2003) have described the scale-free nature of the internet and its fault tolerance.

**Possible essay projects:**

- Extract the adjacency matrix of the pages in the KTH web
- Study the degree distribution and eigenvalue spectrum of the adjacency matrix
- Evaluate graph descriptors such as eigenvalue centrality, k-shell index, clustering coefficients, network diameter, driver nodes, etc.

**References:**

- Barabasi A-L (2007) The architecture of complexity.
*IEEE Control Systems Magazine.*27(4):33-42 - Barabasi A-L, Albert R, Jeong H (2000) Scale-free characteristics of random networks: the topology of the world wide web.
*Physica A*281, 69-77. - Newman M (2003) The structure and function of complex networks.
*SIAM Reviews*55: 167-257.

**Supervisor:** Arvind Kumar

**Title:** **Statistical properties of the tongue movement during natural speech** (3 Possible theses)

**Theme:** Algorithms

**Subject:** Why is it so difficult to learn the accent of a foreign language after a certain age? Vocal output of a language requires well co-ordinated and controlled movements of the vocal chords, jaws, lips and the tongue. We argue that each language imposes a certain non-random statistics of these movements. And once such statistics is learned to enable the fluent speech, it is difficult to over-ride it. That is once you have learned and perfected the statistics if the movements needed for Swedish, the brain will tend to follow that even when you are speaking English, which may require a slightly but significantly different statistics. To test this hypothesis, we would like to examine the statistics of tongue movements in a natural speech in native English speakers (hopefully also in Swedish speakers).

There is a general interest in understanding the statistics of the natural movements (Ingram et al. 2009, Belic and Faisal 2011). In fact, just like the statistical properties of the sensory environment shaped their neural representation and processing, it is assumed that statistical properties of the natural movements tell us how they are created in the brain (Ingram et al. 2009). However, despite the importance of speech in our life, there is little (or no) work done on the statistics of natural moments. But, besides the obvious scientific curiosity, this project could provide us important first clues about the neural representation of speech and diverse accents and help build better speech synthesis devices.

**Possible essay projects:**

- Extract the distribution of tongue movements and identify the most common tongue movements
- Study the transition dynamic of tongue movement
- Extract the relationship between the spectra of the tongue movement and the speech
- Study the inter-subject variability of tongue movement statistics
- Use the dimensionality reduction methods such as the Principal Component Analysis and Locally Linear Embedding

These questions could be divided in three Bachelor’s theses. First will comprise of the topics 1-2, the second will comprise of 3-4 and the third could be the 5^{th} topic in the above.

**References:**

- Howard IS, Ingram JN, Körding KP, Wolpert DM (2009) The statistics of natural movements are reflected in motor errors.
*J Neurophys*,**102:**1901-1910 - Belic J and Faisal AA (2011) The structured variability of finger moto coordination in daily tasks. BMC Neuroscience, 12(1):10.1186/1471-2202-12-S1-P102
- Tipping ME, Bishop CMJ:
**Probabilistic principal component analysis.***Roy Stat Soc***B 21**(3)**:**611-622. - Roweis S and Lawrence S (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323-2326. https://www.cs.nyu.edu/~roweis/lle/algorithm.html

**Supervisor:** Arvind Kumar (with inputs from Olov Engwall)

**Title:** **Use of compressed sensing in network reconstruction** (1 thesis)

**Theme:** Algorithms

**Subject:** Typical process if discretisation involves a sampling process. And we are guided by the Shannon-Nyquist sampling theorem, which states that we need to ‘regularly’ sample the signal at a frequency double the maximum frequency in the signal. We can then reconstruct the original signal by low-pass-filtering the sampled signal. Recently, a new idea has been proposed which does not require regular sampling.

If we are dealing with sparse signals (e.g. the set of human images) we can represent the signal by a random selection of irregularly spaced points. The signal then can be reconstructed by minimizing the L1 norm of the under-determined linear system corresponding to the measurements.

An easy way to think about compressed sensing is to think of a SuDoKu puzzle. Each puzzle is uniquely described if 17 or more entries in a 9x9 SuDoKu are known. But it does not matter which 17 entries are these. And knowing only 17 out of the total 81 entries make it a compressed sensing. The SuDoku solution subject to the rules can be thought of as the L1 minimization of a set of 81 linear equations of which only 17 coefficients are known.

While this technique has become popular in mobile applications and image processing, its potential in the reconstruction of activity of a network of neurons has not be exploited. In a typical neuroscience experiment the activity of only a handful of neurons is recorded. Can we use compressed sensing to construct the activity of the whole network? If yes, then how many minimum neurons are required? Can this work of any activity in the network? Obviously these questions cannot be addressed right away using an experimental data. But we can ask these questions using simulations of neuronal networks in which we know the activity of all the neurons and their connectivity.

**Possible essay projects:**

- Reconstruction of full network activity using from the activity of a small number of neurons in a simulation of biological neuronal networks, using compressed sensing. When it is possible, what is the network activity state in which compressed sensing works and how may neurons are required for the task.

**References:**

- Candès E and Wakin M (2008) An introduction to compressive sampling. IEEE Signal Processing Magazine, 25(2):21-30
- http://dsp.rice.edu/cs

**Supervisor:** Arvind Kumar

**Title: Cancer tumor heterogeneity and clustering.**

Since a couple of years, biology labs have been generating genomic data from cancer tumors. A genome of a cancer cell typically contains many mutations, some consist of a changed nucleotide, some of a deletion or duplication of a longer region, and some of other structural changes of the genome. The mutation are often partitioned into driver mutations that affect the phenotype (the functional characteristics and behavior) and passengers mutations not affecting it.

Some of this data is bulk data in the sense that there actually are subpopulations of cells in the tumor with radically different genomes. Recently several machine learning methods have been suggested for deconvoluting such data, that is, separating the bulk data into clusters representing subpopulations. Based on existing literature and your one ideas implement one such method as well as a generator for synthetic data (similar to the real bulk data). Apply your algorithm to synthetic as well as real data.

**Supervisor:** Jens Lagergren

**Title: Somatic evolution in cancer **

Today, biology labs have been generating single cell genomic data from cancer tumors. A genome of a cancer cell typically contains many mutations, some consist of a changed nucleotide, some of a deletion or duplication of a longer region, and some of other structural changes of the genome. Based on these genomes the a phylogenetic tree representing the evolutionary history of the cells be inferred. Based on existing literature and your one ideas implement one such method as well as a generator for synthetic data (similar to the real bulk data). Apply your algorithm to synthetic as well as real data.

**Supervisor:** Jens Lagergren

**Title: Sub-population dynamics in tumors.**

There are several strongly simplified stochastic processes that capture various aspect of a a tumor grows. Two particular aspects are especially interesting. First, how different subpopulations aries and become the dominant type due to higher fitness (basically efficiency of proliferation). Second, how different models of the cell division, and their parameters, affect the tumor growth and how the obtained model behavior fits available biological data. Implement variations of the models and study their behavior and how it fits data.

**Supervisor:** Jens Lagergren