The course covers probabilistic model based approaches used to capture the structure of functional relationships underlying the global gene expression process. The models differ in the way that prior knowledge and assumptions are formulated, and level of complexity.
Clustering based approaches
The simplest approach clusters genes according to their gene expression profile (assuming that functionally related genes behave similarly) and could be used to analyse how gene function changes between different experimental conditions e.g. case / control. Augmenting a clustering model (for instance a mixture model) with additional layers of complexity can capture increasingly more complex intuitions about the underlying functional relationships between the genes.
Li, Y., Campbell C., and Tipping,. M. Bayesian automatic relevance determination algorithms for classifying gene expression data. Bioinformatics, 18(10):1332-9, 2002.
Chu, W., Ghahramani, Z., Falciani, F. and Wild, D. L. Biomarker discovery in microarray gene expression data with Gaussian processes Bioinformatics, 21(16):3385-3393, 2005.
Nonparametric Bayesian clustering: Rasmussen, C. E., de la Cruz, B. J., Ghaharamani, Z., and Wild, D. Modeling and Visualizing Uncertainty in Gene Expression Clusters using Dirichlet Process Mixtures. IEEE/ACM Transactions on Computational Biology and Bioinformatics Epub ahead, 1-29 (03 2009)
Relations between genes are often condition-specific, and biological states are usually related only through a limited number of genes. Biclustering is the computational problem of clustering both the rows and columns of a data matrix simultaneously, in order to uncover local regions of similarity.
The expression patterns of distinct genes can be jointly and partly explained by the expression of some of its known common transcriptional regulators.
Network models of gene function
Incorporating topological information about the way genes interact can give a better picture about functional relationships between genes, and can be used to encode prior assumptions. Some assumptions could be that cellular processes are organised in a hierarchical manner, and pleiotropy - genes are often involved in multiple functions.
Gene function prediction
Known information about gene function e.g. gene ontology terms can be incorporated into a probabilistic model to supervise learning. How can the structure of the labels be modeled? And how can we overcome the inherent bias in the gene function labels to what is already known?
Integration of data sources
Other sources of (partially) relevant data can be used to help the task of predicting gene function / modelling gene functional modules.