Back to the course web page

Factor graph exercise

On Tuesday we tried using observed link information to improve clustering (with varying level of success). In this exercise we again utilize such link data, but this time the purpose is to infer the underlying network structure solely from the links. We will use graphical models for this purpose, and implement a simplified version of a factor graph presented in Denoising and untangling graphs using degree priors.

Preliminary duties

We will use the Bayes Net Toolbox (see "How to use the toolbox" for a brief manual) for implementing the model. This toolbox is not by default installed on the computers, so we have to start by installing it. Follow the steps mentioned below:

Download FullBNT-1.0.3.zip to a suitable directory within your home directory, and go to that directory
"unzip FullBNT-1.0.3"
Download add_rest.m to FuLLBNT-1.0.3/BNT/.
"cd FullBNT-1.0.3/BNT/"
Start Matlab
"BNT_HOME='~/path_where_you_installed_the_package/FullBNT-1.0.3/'"
Run "add_BNT_to_path" and "add_rest" (may give some warnings; ignore those)
BNT should now be installed, and you can move to a different directory within Matlab to start working.

The model

We will construct a model with following assumptions:

We have a hidden node for each pair of proteins, telling whether the link really exists or not
Observed data is a noisy realization of that
We have a prior over the degree distribution of proteins (=how many proteins it is interacting with), and a hidden node telling the degree of each protein
The model must satisfy the constraints caused by the degrees

The model is constructed as a factor graph shown in Figure 2.(a) of the paper. The factors on the bottom correspond to the noise model of the observations, the middle ones to the hard restrictions, and the top ones to the priors.

Note that we are implementing the inefficient naive version, not the one described in subfigure (c). In practice we cannot run the experiments for more than 10-15 proteins. Why?

Instructions

Start from run_factorgraph.m. It has already much of the code, in particular most of the boring book-keeping stuff and some of the more difficult operations related to linking the factors and nodes.

Your task is to fill in the missing parts, and in particular to study the predictions made by the model. Try looking for example at

How do the values of the factors change things? Can you come up with factor for the degrees that would satisfy some particular properties, such as making the graph scale-free?
What is the difference between individual marginals and the joint distribution of more than one node?
How do the results change if you "observe" also something else than the measurements? Try for example specifying the degree of a certain protein.
Try different inference engines. Does something change? (Note that not all of them are applicable for factor graphs. Changing the model to a Bayes net with fgraph_to_bnet() helps.)

Solution

A working solution to the problem is given in run_factorgraph_solution.m. Just run it (after again runnin add_BNT_to_path and add_rest to get the paths working), and look at the results. After that, you should read the code to see how it was done.

Possible project works

Implement the speedup described in the paper (simply by changing the set of factors) and apply the model on real protein-protein interaction data. You should now learn the factors related to degree distribution and the noise model from data, if possible.
Implement the untangling version of the model by introducing a second set of hidden indicator nodes. Use the model to untangle two different structures. You may generate the data by "summing" data from two different networks, so that you know the true answer to the untangling process.