Back to the course web page
Clustering exercise
Read the problem description.
Data
All the data here (except the GO categories) has been fetched from Ulitsky et al.. However, the data files are not the original ones but a subsample that has been processed to a form that is easy to read in Matlab. These files do not even
have the names of the genes of the GO classes, so doing any biology with these
is impossible. I will later add versions that could be used in the project work.
Use "save as" from the browser to save these to your own folder.
Matlab-related stuff
- Start Matlab with command "matlab"
- You can build everything on top of run_cluster.m (save it into your account and modify). In Matlab you can run a script
by simply writing its name (without .m) on the command line.
- compareToGo.m (requires also fisherextest.m) can be used as basis for
using the GO categories for validation. It is a bad implementation
of enrichment testing without any corrections.
- We only have direct links between proteins. If you want to know also indirect links (path of length X between two proteins) you can use smoothLinks.m.
- If you haven't used Matlab before, you can consult for
example
- The graphical interface might sometimes feel sluggish or even
crash. You can open Matlab also without the GUI using "matlab -nojvm"
- The computing center has a wide range of Matlab toolboxes installed,
but you are not likely to need those
- You can do the exercise also using R if you want to (starts with
command "R"); it is widely used in bioinformatics community and
thus would be the preferred language for this kind of work.
On this course Matlab was chosen only because more students are familiar with it.
Solution
You can take a look at run_cluster_solution.m if you couldn't finish the exercise. It is by no means a comprehensive treatment of the problem, but has the commands for running the clusterings as well as an attempt to some external validation by GO classes.
Page maintained by t615110 (at) cis.hut.fi,
last updated Friday, 17-Aug-2007 12:55:26 EEST