On Tuesday we tried using observed link information to improve clustering (with varying level of success). In this exercise we again utilize such link data, but this time the purpose is to infer the underlying network structure solely from the links. We will use graphical models for this purpose, and implement a simplified version of a factor graph presented in Denoising and untangling graphs using degree priors.
We will use the Bayes Net Toolbox (see "How to use the toolbox" for a brief manual) for implementing the model. This toolbox is not by default installed on the computers, so we have to start by installing it. Follow the steps mentioned below:
We will construct a model with following assumptions:
The model is constructed as a factor graph shown in Figure 2.(a) of the paper. The factors on the bottom correspond to the noise model of the observations, the middle ones to the hard restrictions, and the top ones to the priors.
Note that we are implementing the inefficient naive version, not the one described in subfigure (c). In practice we cannot run the experiments for more than 10-15 proteins. Why?
Start from run_factorgraph.m. It has already much of the code, in particular most of the boring book-keeping stuff and some of the more difficult operations related to linking the factors and nodes.
Your task is to fill in the missing parts, and in particular to study the predictions made by the model. Try looking for example at
A working solution to the problem is given in run_factorgraph_solution.m. Just run it (after again runnin add_BNT_to_path and add_rest to get the paths working), and look at the results. After that, you should read the code to see how it was done.
Implement the speedup described in the paper (simply by changing the set of factors) and apply the model on real protein-protein interaction data. You should now learn the factors related to degree distribution and the noise model from data, if possible.
Implement the untangling version of the model by introducing a second set of hidden indicator nodes. Use the model to untangle two different structures. You may generate the data by "summing" data from two different networks, so that you know the true answer to the untangling process.
You are at: CIS → T-61.5110 - Intensive course on modeling biological networks
Page maintained by t615110 (at) cis.hut.fi, last updated Friday, 17-Aug-2007 10:10:16 EEST