T-61.5040 /

Some resources for matrix derivation and Normal distributions:

Readings

Lecture 1: NFL Theorems are discussed in The Lack of A Priori Distinctions Between Learning Algorithms, David H. Wolpert, Neural Computation 8, 1341-1390 (1996), sections 1-4.

Lecture 3: A light article on Bayes is The Economist. London: Jan 7, 2006. Vol. 378, Iss. 8459; p. 73. On a machine in hut.fi-domain, go to The Economist

The Dutch Book Argument shows that priors must be probabilities unless one accepts irrational decisions. More details are in Notes by D. Freedman

More on Cox's Axioms can be found in K. S. Van Horn, Constructing a logic of plausible inference: a guide to Cox's Theorem, International Journal of Approximate Reasoning 34, no. 1 (Sept. 2003), pp. 3-24. Citeseer

Lecture 4: The demo illustrating overfitting avoidance is based on a generalized linear model with Normal likelihood and a Normal prior. Details can be found in C.K.I. Williams, Prediction with Gaussian Processes, Learning in Graphical Models, ed. Michael I. Jordan, MIT Press, 1999.

Lecture 6: Some background on the example illustrating a nonuniform prior for scale data: Benford's Law

Lecture 8: Simulation methods are a very large and active topic: the lecture only introduced the basic ideas. Introductions to MCMC include

Basics of MCMC by Jeff Gill (Chapter 9 of his book). This is somewhat nonrigorous, but explains the ideas and some practical considerations quite well.
Estimation and Inference via Bayesian Simulation: An Introduction to Markov Chain Monte Carlo, S. Jackman, American Journal of Political Science, Vol. 44, no. 2, April 2000, pp. 369-398. Available from Jstor
Markov Chains for Exploring Posterior Distributions, L. Tierney, Annals of Statistics, vol. 22, no. 4, pp. 1701-1728. Covers the theoretical ideas concisely. Available from Jstor

Knowing when a simulation has converged is a major difficulty in simulation methods. Perfect Sampling offers an interesting way of overcoming this issue in certain type of situations.

Lecture 9: An example of applying variational approximation is found at Fergus et al., Removing Camera Shake From a Single Image

Lecture 10: Some resources on the EM algorithm:

Mixture Densities, Maximum Likelihood, and the EM Algorithm, R. Redner and H. Walker, SIAM Review, 26(2), 1984.
Technical report by Jeff Bilmes

Lecture 11: A standard missing data reference is Statistical Analysis with Missing Data, Little and Rubin, 2nd ed., Wiley 2002.
Data augmentation was proposed in The Calculation of Posterior Distributions by Data Augmentation, M. Tanner and W. Wong, Journal of American Statistical Association, vol. 82, no. 398, pp. 528-540, 1987.

Lecture 12: MacKay's book explains Gaussian Processes in chapter 45. Ways of solving classification problems using GP:s:

Gaussian Processes for Classification: Mean-Field Algorithms, M. Opper and O. Winther, Neural Computation 12, 2655-2684, 2000.
Variational Gaussian Process Classifiers, M. Gibbs and D. MacKay, IEEE Tr Neural Networks 11, no. 6, 1458-1464, 2000.
Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification, Radford Neal, Technical Report 9702, Dept of Statistics, U of Toronto, 1997. found here

Lecture 13: Making Hard Decisions: an Introduction to Decision Analysis, Robert T. Clemen, Duxbury Press, 1995, is a gentle introduction to decision analysis.

For an example where costs do matter in model selection, see Stochastic optimization methods for cost-effective quality assessment in health, D. Fouskakis and D. Draper, 2005. (submitted article)

Model selection is discussed in David MacKay's book in chapter 28. Especially the use of evidence and the resulting preference for simpler models is explained.

[ Main Page ]