Till KTH:s startsida Till KTH:s startsida

Visa version

Version skapad av Pawel Herman 2014-01-16 21:10

Visa < föregående | nästa >
Jämför < föregående | nästa >

Projektförslag

Title: Brain signal pattern recognition

Theme:  Algorithms, Pattern recognition, Machine learning

Subject:   Pattern recognition and machine learning have significantly advanced the field of biological data analysis leading in consequence to the development of effective diagnostic tools and supporting research efforts. The contribution of novel pattern recognition methods has been particularly appreciated in brain data mining as this new approach allows for exploratory search for spatio-temporal patterns in large quantities of high-dimensional nonstationary recordings of brain activity.The emerging trend is to combine machine learning techniques with brain-inspired computing algorithms to address increasingly demanding objectives of brain signal analysis in novel applications.

Below you can find a set of alternative projects (they can be treated individually or in combination).

Possible essay projects:

1)      Survey and examine some of the state-of-the-art methods that involve a machine learning and/or brain-inspired connectionist approach to a well-defined class of brain pattern signal recognition problems (see 2)

2)      Select or propose a method with novel aspects, alternatively select and compare a few existing approaches (prototypes) to a specific brain signal pattern recognition problem, e.g. electroencephalograpic (EEG) signal classification for a brain-computer interface, search for epileptic seizure precursors in high-dimensional brain signal recordings, multivariate analysis and identification of distributed spatial patterns of brain activity for diagnostic purposes.

3)      Discuss key challenges, emerging trends and propose future applications for brain signal recognition methodology.

Supervisor: Pawel Herman

 

Title: Computer-aided medical diagnostics

Theme:  Artificial intelligence, Classification, Machine learning, Algorithms

Subject:   Computer-aided diagnosis has been extensively validated in various medical domains, ranging from biomedical image or signal analysis to expert systems facilitating the process of decision making in clinical settings. Although the usefulness of computational approaches to medical diagnostics is beyond any doubt, there is still a lot of room for improvement to enhance the sensitivity and specificity of algorithms. The diagnostic problems are particularly challenging given the complexity as well as diversity of disease symptoms and pathological manifestations. In the computational domain, a diagnostic problem can often be formulated as a classification or inference task in the presence of multiple sources of uncertain or noisy information. This pattern recognition framework lies at the heart of medical diagnostics projects proposed here.

Below you can find a set of alternative projects (they can be treated individually or in combination).

Possible essay projects:

1)      Define a diagnostic problem within the medical domain and examine the suitability of machine learning, connectionist (artificial network-based), statistical or soft computing methods to your problem.

2)      Survey the state-of-the-art in computational tools supporting classification of disease symptoms and comparatively examine the diagnostic performance of some of them on a wide range of available benchmark data sets. Define a measure for diagnostic performance.

3)      Discuss most recent trends in the field and challenges for computer-assisted diagnostics in medicine.

Supervisor: Pawel Herman

 

Title: Automated scheduling, e.g. university timetabling

Theme:  Artificial intelligence, Machine learning, Algorithms, Optimisation

Subject:   Planning is one of the key aspects of our private and professional life. Whereas planning our own daily activities is manageable, scheduling in large multi-agent systems with considerable amounts of resources to be allocated in time and space subject to multitude of constraints is a truly daunting task. In consequence, scheduling or timetabling as prime representatives of hard combinatorial problems have increasingly become addressed algorithmically with the use of computational power of today's computers. This computer-assisted practice in setting up timetables for courses, students and lecturers has also gained a lot of interest at universities around the world and still constitutes an active research topic.

In this project, students can address a scheduling problem of their own choice or they can use available university timetabling benchmark data and tailor it to the project's needs. An important aspect of such project would be to select or compare different algorithms for combinatorial optimisation, and define a multi-criterion optimisation objective. It could be an opportunity to test computational intelligence and machine learning methodology.

Supervisor: Pawel Herman

 

Title: Stock forecasting, financial data mining

Theme:  Machine learning, Algorithms

Subject:  Stock trading is one of the most common economic activities in the world. Stock prices are very dynamic and commonly undergo quick changes due to the intrinsic nature of the financial domain. From a computational perspective, intelligent trading can be formulated as a data (or more specifically, time-series) prediction problem that involves both known parameters and unknown factors. The overarching idea is to design an algorithm that provides accurate prediction thus allowing for making optimal trading decisions. In this domain, machine learning and oft computing methods have recently proven great potential.

Below you can find a set of alternative projects (they can be treated individually or in combination).

Possible essay projects:

1)      Compare some of the most recent approaches to financial time series prediction and validate their performance on available benchmark data sets (e.g., http://www.stockhistoricaldata.com/download).

2)      Propose a method with your own novel component and verify its suitability for the problem of financial time series analysis on different benchmark data sets.

3)      Discuss urgent challenges and emerging trends in computer-based stock forecasting. Reflect on the statistical nature of the data that reflects complex characteristics of financial markets.

Supervisor: Pawel Herman

 

Title: Inference about simulated neural systems from spiking data

Theme: Algorithms, Simulations

Subject: The proliferation of neural modelling studies along with rapidly expanding dimensions of microscopic  and mesoscopic recordings of neuronal activities in the brain have not been matched yet with effective algorithms for their processing. It is particularly insightful to study so-called spiking dynamics exhibited by individual cells as it provides direct evidence about functional and structural aspects of the information about the underlying neural circuits. The problem amounts to parallel processing of high-dimensional nonstationary point processes and designing a model for inference about various characteristics of the neural source (system) of interest. The methods investigated in the project will be evaluated on the existing data sets generated in neural simulations.

Below you can find a set of alternative projects (they can be treated individually or in combination).

Possible essay projects:

1)      Addressing one of urgent and computationally demanding problems in spiking data analysis, e.g. massively parallel search for spatio-temporal patterns, identification of complex synchronous effects and their validation, clustering and quantifying the distance between time series  of spikes etc.

2)      Providing a tool for fundamental analysis of spike data and visualisation of the results.

3)      Probabilistic (or other alternatives inclusing heuristic approaches) modelling of an inference mechanism for deducing various characteristics (to be closely defined) of the underlying spike generating system (a spiking neural model of cortical circuits). Here for example a structure of neural connectivity (network structure) can be inferred from spiking data or a mode of operation can be deduced. The problem and its difficulty level will be adjusted to the project and student's requirements on an individual basis..

Supervisor: Pawel Herman

 

Title: Visualisation of neural data

Theme: Visualisation, Simulations

Subject: Visualisation is one of the most neglected aspect of a rapidly developing field of computational biology. Only recently can we observe an emerging trend for combining neural simulation frameworks with visualisation software. Still there are a plethora of challenging problems that need to be urgently addressed (high-dimensional data, pre-processing, integration with a simulation software, demands  for purely visual aspects, interactive environment) to render visualisation a practical tool in computational studies. This is envisaged to facilitate computational modelling and assist in demonstrating scientific findings.

Possible essay projects:

1)      Visualisation of existing data produced by models (different types of high-dimensional spatiotemporal data are available).

2)      Conceptual integration with simulating environment to help with data pre-processing (or post-procesing) and facilitate iteractive mode with the user.

3)      Review of the state-of-the-art methodology and a motivated choice of a tool for the computational problem at hand.

Supervisor: Pawel Herman

 

Title: Optimisation and parameter search in computational modelling

Theme: Algorithms

Subject: Model's parameters have a decisive effect on its behaviour and dynamics. Search for parameters is at the same time the most tedious component of computational modelling. Neural simulations are no exception. On the contrary since they account for nonlinear and stochastic effects in brain data, parameters need to be carefully tuned to obtain a desirable functional and/or dynamical outcome. This optimisation procedure is commonly carried out manually on a trial-and-error basis. It is thus desirable to automatise this tedious process by providing an effective parameter search and optimisation scheme. One of key challenges to address is computational efficiency of the implemented method and the definition of a cost function based on the existing "manual" evaluation criteria. Tests in the project will be perfomed with the use of existing neural models or a low-scale simulation demo will be developed.

Possible essay projects:

1)      Define the cost function that reflects the fundamental model evaluation criteria and propose an effective way of its calculation.

2)      Propose a computationally efffective way of evaluating the cost function (p. 1)

3)      Review and propose a parameter search method (from the existing approaches) that match the specificity of computational modelling.

Supervisor: Pawel Herman

Title: QA Systems (how well do they really work?)

 QA systems treat free text kind of like a database and use simple English queries as the query  language. So if the text is 'the Twelve Caesars' by Suetonius, and the query is, ‘Who killed Julius Caesar?’, then the answer should be 'Brutus' or, better, the sentence or paragraph in which the killing was done. Write a thesis which explores the state of the art in Question Answering systems. You should focus on either closed domain or open domain systems. And you should do some experiments where you compare and contrast various approaches and systems using a test corpus that you develop yourself or find on the internet. Critically discuss your findings. Are these types of systems reliable enough for common use cases? If not what are the missing ingredients?

Supervisor: Michael Minock

Title: Go Go!

 Why is the ancient game of Go so hard for computers? Write a player that wins on n x n boards. How large can you get n to be where the system still beats you, the programmer? Discuss some of what others have tried to make competitive Go players. Do you think Go programs will beat all human players on the 19x19 Go board by 2050?  Give arguments.

Supervisor: Michael Minock

Title: Isovists Indexes.

 In navigation systems it is important to be able to quickly calculate which buildings/objects are viewable from a given x,y position. Let's investigate the 2D case. Simple polygons in the neighborhood of other simple polygons have a viewability area which defines the area from which the polygon may be seen. This area may be delimited via a (set of) polygon(s), not necessarily simple. Write a routine to compute these viewability polygons (aka Isovists) and implement it in a PostGIS index scheme to calculate the polygons viewable from any x,y point. Compare the performance of your result with naive line-of-sight run-time methods.

Supervisor: Michael Minock

Title: PostgreSQL Replication.

 There is a need these days for database systems that are horizontally scalable (being able to add additional servers to a running system and improve performance). PostgreSQL has the ability to replicate databases across servers. Is it possible to implement ACID transactions across multiple postgreSQL instances. In principle, the answer seems to be yes, but it is certainly not easy -- consider distributed two-phase commit. Thus the transaction model must be something more like BASE. Specifically categorize and implement simple examples of the types of transactions that can be supported over PostgreSQL with replication 'out of the box'.

Supervisor: Michael Minock

Title: Fly Catcher

If flies would stay put, it would be easy to catch them. If we wanted to determine which flies we would catch in a n-dimensional rectangle, we could just index fly positions using an R-Tree. But flies move about randomly. This is an instance of a 'moving objects database problem' and it has recently attracted a fair bit of attention. Review the literature on moving objects databases and implement and compare several approaches in terms of how many flies can be tracked and caught in a real time simulation. Compare no-index approaches, base-line R-Tree implementations, and then one or more of the moving object indexes from the literature.

Supervisor: Michael Minock

Title: Skärgården

There is a constant stream of spatial-temporal data, mostly numerical, that provides sea and weather prognoses for that magical, awe inspiring place that we call 'Skärgården' (or the Stockholm Archipelago). Investigate methods to turn this flood data into natural language reports (e.g. "It will be calm with scattered clouds Saturday morning and afternoon in Trälhavet, but at night the wind will pick up to 7 knots, blowing to the South.") Investigate and potentially reuse the methods developed by the natural language generation group in Aberdeen, Scotland who generated weather reports for North Sea drilling operations.

Supervisor: Michael Minock

Title: Replication of Precise. 

The paper 'Towards a theory of natural language interfaces to databases' by Popescu, Etzioni and Kautz, Proc of IUI, 2003, developed a novel method to support natural language interfaces to databases. Over the graph of tokens representing the database, max-flow is computed over the tokens represented in the user's natural language query. From the resulting sub-graph an SQL expression is trivially generated. Crucially, the approach claims that it can identify 'semantically tractable' queries that can be answered with 100% confidence. The paper presented results over the GEO corpus. In this work, attempt to replicate their results and then critically evaluate their claims.

Supervisor: Michael Minock

 

Title: Replication of Lambda-WASP.

Baby hears word sequences coupled with baby's pre-linguistic understandings of objects and events in the world. From this baby learns knowledge that lets baby map from word sequences to meaning expressions (natural language understanding) and map from meaning expressions to word sequences (natural language generation). Raymond Mooney's group looked at this problem for natural language understanding in their 2007 paper 'Learning Synchronous Grammars for Semantic Parsing with Lambda Calculus'. Attempt to replicate the results of this work and evaluate their claims. Time permitting, review additional work that has looked at this problem more recently. Critically discuss.

Supervisor: Michael Minock

Title: "Give me something like ..."

 In large databases it is sometime reasonable to ask a query such as, “give me the n objects most similar to a given object.”  There is a well developed literature on the similarity problem. Review the literature on the computation of similarity in databases. Apply this to a large data set (e.g. MusicBrains, OpenStreetMaps, etc.) and evaluate alternative approaches. It is quite likely that you will specialize your literature search and implementation to the single database/domain that you decide to focus on.  Note this will involve you securing and focusing on a large open databases in your domain of interest.

Supervisor: Michael Minock