The course covers probabilistic model based approaches used to capture the structure of functional relationships underlying the global gene expression process. The models differ in the way that prior knowledge and assumptions are formulated, and level of complexity.

Clustering based approaches

The simplest approach clusters genes according to their gene expression profile (assuming that functionally related genes behave similarly) and could be used to analyse how gene function changes between different experimental conditions e.g. case / control. Augmenting a clustering model (for instance a mixture model) with additional layers of complexity can capture increasingly more complex intuitions about the underlying functional relationships between the genes.

Classification with sparsity inducing priors:
Li, Y., Campbell C., and Tipping,. M. Bayesian automatic relevance determination algorithms for classifying gene expression data. Bioinformatics, 18(10):1332-9, 2002.

Chu, W., Ghahramani, Z., Falciani, F. and Wild, D. L. Biomarker discovery in microarray gene expression data with Gaussian processes Bioinformatics, 21(16):3385-3393, 2005.
Hierarchical clustering: Carrivick, L., Rogers, S., Clark, J., Campbell, C., Girolami, M. and Cooper, C. Identification of prognostic signatures in breast cancer microarray data using Bayesian techniques J R Soc Interface, 3(8): 367–381, 2006.
Nonparametric Bayesian clustering: Rasmussen, C. E., de la Cruz, B. J., Ghaharamani, Z., and Wild, D. Modeling and Visualizing Uncertainty in Gene Expression Clusters using Dirichlet Process Mixtures. IEEE/ACM Transactions on Computational Biology and Bioinformatics Epub ahead, 1-29 (03 2009)

Biclustering

Relations between genes are often condition-specific, and biological states are usually related only through a limited number of genes. Biclustering is the computational problem of clustering both the rows and columns of a data matrix simultaneously, in order to uncover local regions of similarity.

Madeira, S. and Oliveira, A. Biclustering Algorithms for Biological Data Analysis: A Survey. IEEE Tr. CCB 1(1):24-45, 2004.
Gerber, G., Dowell, R., Jaakkola, T., and Gifford, D. Automated discovery of functional generality of human gene expression programs. PLoS Comp. Biol. 3(8):e148, 2007.
Flahery, P., Giaever, G., Kumm, J., Jordan, M.I., and Arkin, A.P. A latent variable model for chemogenomic profiling. Bioinformatics 21(15):3286-93, 2005.

Regulatory modules

The expression patterns of distinct genes can be jointly and partly explained by the expression of some of its known common transcriptional regulators.

Segal, E. et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nature Genetics 34(2):166-176, 2003.
Segal, E. et al. From signatures to modules: Understanding cancer using microarrays. Nature Genetics 37(suppl):S38-S45, 2005.

Network models of gene function

Incorporating topological information about the way genes interact can give a better picture about functional relationships between genes, and can be used to encode prior assumptions. Some assumptions could be that cellular processes are organised in a hierarchical manner, and pleiotropy - genes are often involved in multiple functions.

Fraser, A. G. and Marcotte, E. M. A probabilistic view of gene function Nature Genetics, 36(6):559-564, 2004
Ideker, T. and Sharan R. Protein networks in disease. Genome Research 18(4):644-652, 2008.
H. Wang et al. A Complex-based Reconstruction of the Saccharomyces cerevisiae Interactome. Molecular & Cellular Proteomics 8(6):1361-1381, 2009.
Jaimovich, A., Elidan, G., Margalit, H., and Friedman, N. Towards an integrated protein-protein interaction network: A relational Markov network approach. Journal of Computational Biology 13(2):145-164, 2006.

Gene function prediction

Known information about gene function e.g. gene ontology terms can be incorporated into a probabilistic model to supervise learning. How can the structure of the labels be modeled? And how can we overcome the inherent bias in the gene function labels to what is already known?

Barutcuoglu, Z., Schapire, R. E., and Troyanskaya, O. G. Hierarchical multi-label prediction of gene function Bioinformatics, 22(7):830-836, 2006.

Integration of data sources

Other sources of (partially) relevant data can be used to help the task of predicting gene function / modelling gene functional modules.

Segal, E., Wang, H., Koller, D. Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics 19(suppl.1):i264-i272, 2003.

Jansen, R., Greenbaum, D., Gerstein, M. Relating whole-genome expression data with protein-protein interactions. Genome Research 12(1):37-46, 2002.

Ucar, D., Beyer, A., Parthasarathy, S., Workman, C. Predicting functionality of protein-DNA interactions by integrating diverse evidence. Bioinformatics 25(12):i137-i144, 2009.

Myers, C.L., Troyanskaya, O. Context-sensitive data integration and prediction of biological networks. Bioinformatics 23(17): 2322-2330, 2007.