T-61.6080 Special Course in Bioinformatics II P (3-10 cr)

Presentation topics

The course covers probabilistic model based approaches used to capture the structure of functional relationships underlying the global gene expression process. The models differ in the way that prior knowledge and assumptions are formulated, and level of complexity.


Clustering based approaches

The simplest approach clusters genes according to their gene expression profile  (assuming that functionally related genes behave similarly)  and could be used to analyse how gene function changes between different experimental conditions e.g. case / control. Augmenting a clustering model (for instance a mixture model) with additional layers of complexity can capture increasingly more complex intuitions about the underlying functional relationships between the genes.   



Relations between genes are often condition-specific, and biological states are usually related only through a limited number of genes. Biclustering is the computational problem of clustering both the rows and columns of a data matrix simultaneously, in order to uncover local regions of similarity.


Regulatory modules

The expression patterns of distinct genes can be jointly and partly explained by the expression of some of its known common transcriptional regulators.


Network models of gene function

Incorporating topological information about the way genes interact can give a better picture about functional relationships between genes, and can be used to encode prior assumptions. Some assumptions could be that cellular processes are organised in a hierarchical manner, and pleiotropy - genes are often involved in multiple functions.


Gene function prediction

Known information about gene function e.g. gene ontology terms can be incorporated into a probabilistic model to supervise learning. How can the structure of the labels be modeled? And how can we overcome the inherent bias in the gene function labels to what is already known?

Integration of data sources

Other sources of (partially) relevant data can be used to help the task of predicting gene function / modelling gene functional modules.