Till KTH:s startsida Till KTH:s startsida

Visa version

Version skapad av Pawel Herman 2016-01-12 22:13

Visa < föregående | nästa >
Jämför < föregående | nästa >

Projektförslag

Please email your preferred project options to dd143x.pawel@gmail.com.

Projekt idéer

Title: Sentiment Classification in Social Media

Theme: Natural language processing, Classification

Subject:  The proliferation of social media has generated much interest amongst scientists and within industry - people's opinions are freely expressed and this carries value for many people.  However, social media is a challenge to study as despite the availability of data, extracting interesting information is difficult due to many reasons including: short form communication, abbreviations, slang words, lack of capitals, sarcastic humour, and poor spelling, punctuation and grammar. One type of information of value is sentiment - the emotional payload of a unit of social media. Two prevailing approaches attempt to solve the challenge of sentiment classification: machine-learning and lexicon-based methods.  This project will investigate these approaches with a view to comparing their performance using datasets that contain gold standard human judgements of sentiment.

Supervisor: Richard Glassey

 

Title: Detecting Visual Plagiarism

Theme: Computer vision, Classification

Subject: Textual plagiarism is routinely used within universities to provide evidence that students have committed plagiarism within their reports.  However, visual artefacts within a report are often discarded during analysis.  Clearly, building a visual corpus demands more resources than a textual corpus, but this topic raises the concern: if a picture paints 1000 words, should we not also be equally concerned about the plagiarism potential?  As reports may combine several types of picture (photograph, diagram, graph) as well as deliberate attempts to obfuscate the true source of a picture (rotation, scaling, cropping), several complementary methods may be required to arrive at a solution that balances accuracy with reasonable performance.

Supervisor: Richard Glassey


 

Title: Are They Paying Attention?

Theme: Computer vision

Subject: A concern for a lecturer is whether students are paying attention during a lecture; various context switches can recapture attention, and knowing when to deploy them would be a benefit.  However, this can be difficult to gauge for various reasons - large class sizes, low lighting conditions and a dependence upon technology to capture data in for analysis in the first place.  One potential solution is to apply face detection in combination with image capture (stills or video).  Now familiar in social media and photo editing applications, this project considers if and how face detection could be used to measure 'group attention' - that is, given a crowd of size n, what proportion of n is paying attention, and how stable this is across the duration of a lecture.  Ideally, this project will investigate the limits of state of the art face-detection algorithms, whilst also considering the engineering challenges to deliver such a project.

Supervisor: Richard Glassey

 


 

Title: Brain signal pattern recognition

Theme: Algorithms, Machine learning, Classification, Signals

Subject:   Pattern recognition and machine learning have significantly advanced the field of biological data analysis leading in consequence to the development of effective diagnostic tools and supporting research efforts. The contribution of novel pattern recognition methods has been particularly appreciated in brain data mining as this new approach allows for exploratory search for spatio-temporal patterns in large quantities of high-dimensional nonstationary recordings of brain activity. The emerging trend is to combine machine learning techniques with brain-inspired computing algorithms to address increasingly demanding objectives of brain signal analysis in novel applications.

Below you can find a set of alternative projects (they can be treated individually or in combination).

Possible essay projects:

  • Develop your own approach or build upon the existing approaches to a specific brain signal pattern recognition problem, e.g.
    • electroencephalographic (EEG) signal classification for a brain-computer interface (BCI),
    • EEG-based epileptic seizure prediction (identifying precursors in high-dimensional brain signal recordings)
    • automated sleep scoring based on physiological signals including EEG
  • Alternatively, select and compare a few existing state-of-the-art methods. Focus on selected aspects of a brain signal pattern recognition problem of your choice (handling signal, extracting patterns, classifying and interpreting brain signal correlates)
  • Discuss key challenges, emerging trends and propose future applications for brain signal recognition methodology.

 Supervisor: Pawel Herman

 

Title: Computer-aided medical diagnostics

Theme: Artificial intelligence, Classification, Machine learning, Algorithms

Subject:   Computer-aided diagnosis has been extensively validated in various medical domains, ranging from biomedical image or signal analysis to expert systems facilitating the process of decision making in clinical settings. Although the usefulness of computational approaches to medical diagnostics is beyond any doubt, there is still a lot of room for improvement to enhance the sensitivity and specificity of algorithms. The diagnostic problems are particularly challenging given the complexity as well as diversity of disease symptoms and pathological manifestations. In the computational domain, a diagnostic problem can often be formulated as a classification or inference task in the presence of multiple sources of uncertain or noisy information. This pattern recognition framework lies at the heart of medical diagnostics projects proposed here.

Below you can find a set of alternative projects (they can be treated individually or in combination).

Possible essay projects:

  • Define a diagnostic problem within the medical domain and examine the suitability of machine learning, connectionist (artificial network-based), statistical or soft computing methods to your problem.
  • Survey the state-of-the-art in computational tools supporting classification of disease symptoms and comparatively examine the diagnostic performance of some of them on a wide range of available benchmark data sets. Define a measure for diagnostic performance.
  • Discuss most recent trends in the field and address some of the urgent challenges for computer-assisted diagnostics in medicine.

 Supervisor: Pawel Herman

 

Title: Brain-inspired (biomimetic) computing

Theme: Algorithms, Connectionist systems, Networks

Subject: Current computational methods, algorithms and available software turn computers into machines particularly effective in bookkeeping, solving complex but well-defined problems, searching for specific patterns etc. However, today’s computers perform rather poorly in tasks where multi-modal perception allowing to identify complex undefined patterns in large data streams is needed, or common sense reasoning and handling ambiguity are required among others at the cost of precision or speed to effectively solve a range of real-world problems and meet growing demands for robustness, flexibility, adaptation as well as computational scalability (for example, big data challenge). Brain- or neural-inspired computing is an emerging field of research that aims to design such efficient algorithms based on the principles used by the nervous system with the brain in the first place to process generic information. In the family of neural network based (connectionist) systems the focus is on the biomimetic nature of the network architectures and learning mechanisms. Some of these connectionist methods devised in the realm of brain-inspired computing achieve human-competitive performance in recognition tasks.

In this project, students are first obliged to familarise themselves with the state-of-the-art methods in the emerging field of computational methods inspired by brain’s neural networks (connectionist approach) or more general cognitive architectures. Using this background information, students will be in a position to devise their own method or build upon the existing techniques to address a selected computational problem. A range of potential applications is wide as it involves among others problems with high-dimensional data (multivariate) having complex relationships, requiring explorative search for interesting multi-dimensional patterns with potentially hierarchical structure (low-level features that serve as building blocks for high-level data representations) and a possibility to perform a classification task or make inference. Some examples of broad applications are image analysis (or generally pattern recognition), speech recognition, data mining (e.g. medical, financial, industrial), high-dimensional time-series prediction etc. In the course of the project, special attention should be paid to the scale of the developed computational algorithm, implementation challenges, modularisation and, most importantly, functionality (robustness to noisy conditions, flexibility, effective learning from environment., capability to handle unsupervised or semi-supervised learning scenarios etc.)

Supervisor: Pawel Herman

 

Title: Automated scheduling, e.g. university timetabling

Theme: Artificial intelligence, Machine learning, Algorithms, Optimisation

Subject:   Planning is one of the key aspects of our private and professional life. Whereas planning our own daily activities is manageable, scheduling in large multi-agent systems with considerable amounts of resources to be allocated in time and space subject to multitude of constraints is a truly daunting task. In consequence, scheduling or timetabling as prime representatives of hard combinatorial problems have increasingly become addressed algorithmically with the use of computational power of today's computers. This computer-assisted practice in setting up timetables for courses, students and lecturers has also gained a lot of interest at universities around the world and still constitutes an active research topic.

In this project, students can address a scheduling problem of their own choice or they can use available university timetabling benchmark data and tailor it to the project's needs. An important aspect of such project would be to select or compare different algorithms for combinatorial optimisation, and define a multi-criterion optimisation objective. It could be an opportunity to test computational intelligence and machine learning methodology.

Supervisor: Pawel Herman

 

Title: Stock forecasting, financial data mining

Theme: Machine learning, Algorithms, Time-series prediction

Subject: Stock trading is one of the most common economic activities in the world. Stock prices are very dynamic and commonly undergo quick changes due to the intrinsic nature of the financial domain. From a computational perspective, intelligent trading can be formulated as a data (or more specifically, time-series) prediction problem that involves both known parameters and unknown factors. The overarching idea is to design an algorithm that provides accurate prediction thus allowing for making optimal trading decisions. In this domain, machine learning and oft computing methods have recently proven great potential.

Possible essay projects:

  • Compare some of the most recent approaches to financial time series prediction and validate their performance on available benchmark data sets (e.g., http://www.stockhistoricaldata.com/download).
  • Propose a method with your own novel component and verify its suitability for the problem of financial time series analysis on different benchmark data sets.
  • Concentrate on one or two of the emerging challenges in computer-based stock forecasting and address them using selected methodology. Reflect on the statistical nature of the data that reflects complex characteristics of financial markets.

Supervisor: Pawel Herman

 

Title: Intelligent control systems

Theme: Machine learning, Artificial intelligence, Algorithms, Control, Soft computing

Subject: There is a clear trend for smarter machines that are able to collect data, learn, recognize objects, draw conclusions and perform behaviors to emerge in our daily life. Advanced intelligent control systems affect many aspects of human activities and can be found in a wide range of industries, e.g. healthcare, automotive, rail, energy, finance, urbanization and consumer electronics among others. By adapting and emulating certain aspects of biological intelligence this new generation of control approaches makes it possible for us to address newly emerging challenges and needs, build large-scale applications and integrate systems, implement complex solutions and meet growing demand for safety, security and energy efficiency.

Possible essay projects:

  • Select a real-world control problem (traffic control, energy management, helicopter or ship steering, industrial plant control, financial decision support and many others) and propose a new approach using machine learning and soft computing methodology (computational intelligence) that enhances functionality, automatisation and robustness when compared to classical solutions.
  • Demonstrate functional (and other) benefits of “computationally intelligent” control approaches in relation to the classical methodology in a range of low-scale control problems (benchmarks). Discuss a suitable framework of comparison and potential criteria.
  • Consider a control robotic application with all constraints associated with autonomous agents and real-world environments (which can be emulated in software). Propose “computationally intelligent” methods to enable your robot agent prototype to robustly perform complex tasks (learn from the environment, evolve over time, find solutions to new emerging problems and adapt to new conditions among others).

Supervisor: Pawel Herman

 

Title: Web document classification

Theme: Pattern recognition, Machine learning, Algorithms, Statistics, Computational linguistics

Subject:

Possible essay projects:

Supervisor: Pawel Herman

 

Title: Visualisation of neural data

Theme: Visualisation / Simulations

Subject: Visualisation is one of the most neglected aspect of a rapidly developing field of computational biology. Only recently can we observe an emerging trend for combining neural simulation frameworks with visualisation software. Still there are a plethora of challenging problems that need to be urgently addressed (high-dimensional data, pre-processing, integration with a simulation software, demands for purely visual aspects, interactive environment) to render visualisation a practical tool in computational studies. This is envisaged to facilitate computational modelling and assist in demonstrating scientific findings.

Possible essay projects:

  • Visualisation of existing data produced by models (different types of high-dimensional spatiotemporal data are available).
  • Conceptual integration with simulating environment to help with data pre-processing (or post-procesing) and facilitate iteractive mode with the user.
  • Review of the state-of-the-art methodology and a motivated choice of a tool for the computational problem at hand.

Supervisor: Pawel Herman

 

Title: Optimisation and parameter search in computational modelling

Theme: Algorithms

Subject: Model's parameters have a decisive effect on its behaviour and dynamics. Search for parameters is at the same time the most tedious component of computational modelling. Neural simulations are no exception. On the contrary since they account for nonlinear and stochastic effects in brain data, parameters need to be carefully tuned to obtain a desirable functional and/or dynamical outcome. This optimisation procedure is commonly carried out manually on a trial-and-error basis. It is thus desirable to automatise this tedious process by providing an effective parameter search and optimisation scheme. One of key challenges to address is computational efficiency of the implemented method and the definition of a cost function based on the existing "manual" evaluation criteria. Tests in the project will be perfomed with the use of existing neural models or a low-scale simulation demo will be developed.

Possible essay projects:

  • Define the cost function that reflects the fundamental model evaluation criteria and propose an effective way of its calculation.
  • Propose a computationally effective way of evaluating the cost function (p. 1)
  • Review and propose a parameter search method (from the existing approaches) that match the specificity of computational modelling.

Supervisor: Pawel Herman

 

Title: Multi-scale brain simulations

Subject: Simulations of neural systems and the brain can be performed on different scales, e.g. we could try to simulate every single neuron as detailed as possible. We could also assume populations of neurons to be the basic computational units and neglect the dynamics of single neurons. Libraries exist to simulate neural systems at several of these scales (e.g. GENESIS, NEURON, Nest, Nexa). These can be called from C++ or Python and be run on desktop machines as well as on supercomputers.

 Possible essay projects:

  • Implement some basic models in available simulators.
  • Compare the type of simulations they can perform – what is the “correct” scale to perform brain simulations?
  • As these simulations get closer to mimic real brains, what are the implications for medicine and computing? Ethical concerns?

 Supervisor: Pawel Herman

 



Title: Visualization and classification of experimental neural data

Theme: Biology, Nervous System, Machine Learning

Subject: In biological experiments electrical activity in the nervous tissue is often recorded in multiple places at the same time.  Multielectrode recording provides great amount of data but poses a problem of identification and classification of signals.  Major difficulty arises from variability of neural responses to standard stimulus which is worked around by collecting data from more repetitions. Intelligent data processing should allow identification of individual neurons and groups of neurons by their involvement in the neural response at different phases of the spatio-temporal patern.

Possible essay projects:

  • Interactive visualization of experimental data
  • Automatic identification of individual neurons in the data poo
  • lClassification of neurons by their role in the activity pattern

Supervisor: Alex Kozlov


Title: Analysis and synthesis of neuronal morphology

Theme: Cell Biology, Nervous System, Topology, Statistics, Machine Learning

Subject: Neurons have extremely complicated shapes determined by their role and place in the nervous system. However despite great individual variability neurons are classified by morphological types. Rapid growth of public repositories of neuromorphological data allows development of mathematical methods of neuron classification.

Possible essay projects:

  • Characterization of neuron morphology.
  • Mathematical synthesis of artificial neurons
  • Automatic classification of morphological neuron types?

Supervisor: Alex Kozlov



Title: Multi-objective optimization

Theme: Applied Mathematics, Decision Making, Optimization

Subject: In multi-objective optimization several goal functions are optimized simultaneously.  Due to possible inherent conflicts the optimal solution does not exist and optimality is considered as a trade-off between objectives. This gives rise to interesting semi-heuristic approaches
and opens opportunities for experimentation.

Possible essay projects:

  • Inverse parameter estimation using multiobjective approach
  • Mathematical synthesis of artificial neurons
  • Evolutionary algorithms for solving multi-objective problems

Supervisor: Alex Kozlov


-------------------------------------------------------------------------------------------------------------------------------

Title: Structured prediction

Subject: An exciting new direction in supervised Machine Learning is to learn functions to structures, not just single values. For example we might learn mappings from sequences (or bags) of words to trees. Structured learning is particularly promising in domains such as  computational linguistics and computational biology. In this thesis you will explore the techniques in this area and evaluate results of various approaches over a domain of your choosing.  

Reference:

Search-based structured prediction, Hal Daumé III, John Langford and Daniel Marcu. Machine Learning (2009) 75: 297–325

Supervisor: Michael Minock

Title: Grounding Language in Perception

Quoting Jeffery Siskind:

Suppose that you were a child. And suppose that you heard the utterance 'John walked to school'. And suppose that when hearing this utterance, you saw John walk to school. And suppose, following Jackendoff (1983), that upon seeing John walk to school, your perceptual faculty could produce the expression GO(John, TO(school)) to represent that event. And further suppose that you would entertain this expression as the meaning of the utterance that you just heard. At birth, you could not have known the meanings of the words 'John', 'walked', 'to', and 'school', for such information is specific to English. Yet, in the process of learning English, you come to possess a mental lexicon that maps the words 'John', 'walked', 'to', and 'school' to representations like 'John', 'GO(x,y)', 'TO(x)', and 'school', respectively. This paper explores one way that this might be done.

In this thesis, you are to replicate (some variant of) Siskind's experiment and comment on its suitability for practical application.

Reference:

A Computational Study of Cross-Situational Techniques for Learning Word-to-Meaning Mappings,' Cognition, 61(1-2):39-91, October/November 1996.

Supervisor: Michael Minock

 

Title: Replication of Lambda-WASP

Subject: Baby hears word sequences coupled with baby's pre-linguistic understandings of objects and events in the world. From this baby learns knowledge that lets baby map from word sequences to meaning expressions (natural language understanding) and map from meaning expressions to word sequences (natural language generation). Raymond Mooney's group looked at this problem for natural language understanding in their 2007 paper 'Learning Synchronous Grammars for Semantic Parsing with Lambda Calculus'. Attempt to replicate the results of this work and evaluate their claims. Time permitting, review additional work that has looked at this problem more recently. Critically discuss.

Supervisor: Michael Minock

 

Title: Recursion in PostgreSQL. How much can it do?

Subject: Recursion has been added to SQL in general and PostgreSQL in specific. But how powerful is it? How does it compare to the various extensions of Datalog (e.g. stratified negation)? How well does it perform? How well does recursion interact with regular view definitions? In this thesis you will deeply explore, document and evaluate recursion in SQL.

Supervisor: Michael Minock

 

Title: Query Containment

Subject: In a variety of settings (mobile caches, query optimizations, etc.)  it is useful to be able to decide whether one query contains another. That is, over any possible state of the database, are the results of one query always a subset of the other? In this thesis you will survey query containment results for a given query language (e.g. fragments of SQL, or SPARQL,  or XPATH, etc... ). You will implement containment algorithms and evaluate their effectiveness over benchmarks.

Supervisor: Michael Minock

Title: Gamification of NLI Acquisition

Subject: Several researchers have explored the possibility of using computer games to acquire knowledge to build Natural Language Interfaces (NLIs). In this thesis, review earlier attempts, build your own gamification prototype and evaluate.

Supervisor: Michael Minock

 

Title: Natural Language Interface technology in Computer Games

Subject: Early in the history of computer gaming there were many delightful text-based adventure games. In such games the user made their way through the game by typing in natural language commands. Have such interfaces been integrated into more modern games and, if so, how? After you survey developments, implement a small game using NLI techniques and evaluate it, or variations of it with a group of users. Are NLIs  ‘fun’?  What are future prospects or opportunities for NLIs in computer games?

Supervisor: Michael Minock

Title: Replication of Precise

Subject: The paper 'Towards a theory of natural language interfaces to databases' by Popescu, Etzioni and Kautz, Proc of IUI, 2003, developed a novel method to support natural language interfaces to databases. Over the graph of tokens representing the database, max-flow is computed over the tokens represented in the user's natural language query. From the resulting sub-graph an SQL expression is trivially generated. Crucially, the approach claims that it can identify 'semantically tractable' queries that can be answered with 100% confidence. The paper presented results over the GEO corpus. In this work, attempt to replicate their results and then critically evaluate their claims.

Supervisor: Michael Minock


Title: Ant Colonies and Bio-inspired computing

Theme: Algorithms

Subject: An ant colony is an example of highly distributed and dynamic system where agents (ants) have to collaboratively achieve some goals such as searching for food. This project will study algorithmic problems inspired by ant colonies. See [1, 2] for an example. More generally, one can formulate and study any problem inspired by a biological system. See [3, 4] for an example of research in this direction.

Possible essay projects: Formulate a new problem inspired by ant colonies or a biological system. Experimentally or theoretically study this problem, e.g. compare different heuristics for solving this problem by running them on some data, or show that the problem can or cannot be solved in polynomial time.

References:

[1] Task Allocation in Ant Colonies, paper by Cornejo et al.

[2] Towards More Realistic ANTS, paper by Emek et al.

[3] 2nd Workshop on Biological Distributed Algorithms

[4] Bioinspired computation in combinatorial optimization: algorithms and their computational complexity, paper by Neumann and Witt

Supervisor:


Title: Numerical Algorithms for technical and scientific Computation

Keywords: Numerical mathematics, simulations, algorithms

Subject: Numerical algorithms gain an ever increasing importance together with the wider use of simulations and numerical calculations in many fields. This starts with basic research in physics and chemistry, covers life sciences or material sciences, and does not end in the product development of many engineering sciences.

The efficient implementation of numerical algorithms on modern computer systems is at the heart of software development for scientific and technical applications and an interesting and challenging task for computer scientists. Knowledge about the implementation of basic algorithms is a prerequisite for the efficient solution of large and complex calculation projects later on.

Possible essay projects

The idea of the proposed projects is to dive into the implementation of numerical algorithms. This will include the exploration of the algorithm's behaviour with convenient software tools that have been tailored for mathematical problems like Matlab or some Python-based software packages. Finally, the algorithm will be implemented, analysed and optimised for modern processors using a compiled programming language, for example C or C++. Specific topics can be selected from different application fields and can be chosen according to the interests and the previous knowledge of the student:

  • linear algebra, the solution of linear systems,
  • calculus, the evaluation of functions, numerical differentiation and integration,
  • the solution of differential equations,
  • applications of the Fast Fourier Transformation (FFT),
  • and many more

Supervisor: Michael Schliephake



Title: Graph properties of the KTH web (1 thesis)

Theme: Algorithms/Graphs

Subject: Graphs have emerged as a powerful tool to study complex systems. The graph of a system describes an interaction between a pair of nodes as an edge, which may have a weight, and a direction. The ‘adjacency matrix’ described all the possible pair-wise integrations between the nodes. Such graphical representations can give deep insights about the robustness and dynamics properties of a system (Newman 2003, ). A typical example of a system that can be studied using graph theory is the world-wide-web. Previously, several studies (e.g. Barabasi 2007, Barabasi et al. 2003) have described the scale-free nature of the internet and its fault tolerance.

Possible essay projects:

  • Extract the adjacency matrix of the pages in the KTH web
  • Study the degree distribution and eigenvalue spectrum of the adjacency matrix
  • Evaluate graph descriptors such as eigenvalue centrality, k-shell index, clustering coefficients, network diameter, driver nodes, etc.

References:

  • Barabasi A-L (2007) The architecture of complexity. IEEE Control Systems Magazine. 27(4):33-42
  • Barabasi A-L, Albert R, Jeong H (2000) Scale-free characteristics of random networks: the topology of the world wide web. Physica A 281, 69-77.
  • Newman M (2003) The structure and function of complex networks. SIAM Reviews 55: 167-257.

Supervisor:


Title: Cancer tumor heterogeneity and clustering.

 Since a couple of years, biology labs have been generating genomic data from cancer tumors. A genome of a cancer cell typically contains many mutations, some consist of a changed nucleotide, some of a deletion or duplication of a longer region, and some of other structural changes of the genome. The mutation are often partitioned into driver mutations that affect the phenotype (the functional characteristics and behavior) and passengers mutations not affecting it.   

Some of this data is bulk data in the sense that there actually are subpopulations of cells in the tumor with radically different genomes. Recently several machine learning methods have been suggested for deconvoluting such data, that is, separating the bulk data into clusters representing subpopulations. Based on existing literature and your one ideas implement one such method as well as a generator for synthetic data (similar to the real bulk data). Apply your algorithm to synthetic as well as real data. 

Supervisor: Jens Lagergren

 


 Title: Somatic evolution in cancer 

Today, biology labs have been generating single cell genomic data from cancer tumors. A genome of a cancer cell typically contains many mutations, some consist of a changed nucleotide, some of a deletion or duplication of a longer region, and some of other structural changes of the genome. Based on these genomes the a phylogenetic tree representing the evolutionary history of the cells be inferred.  Based on existing literature and your one ideas implement one such method as well as a generator for synthetic data (similar to the real bulk data). Apply your algorithm to synthetic as well as real data. 

Supervisor: Jens Lagergren 


Title: Sub-population dynamics in tumors.

 There are several strongly simplified stochastic processes that capture various aspect of a a tumor grows. Two particular aspects are especially interesting. First, how different subpopulations aries and become the dominant type due to higher fitness (basically efficiency of proliferation). Second, how different models of the cell division, and their parameters, affect the tumor growth and how the obtained model behavior fits available biological data. Implement variations of the models and study their behavior and how it fits  data. 

Supervisor: Jens Lagergren