[ Main Page ]
## Readings

*Lecture 1:* NFL Theorems are discussed in
*The Lack of A Priori Distinctions Between Learning Algorithms*,
David H. Wolpert, Neural Computation 8, 1341-1390 (1996), sections 1-4.

*Lecture 11:*
A standard missing data reference is
*Statistical Analysis with Missing Data*,
Little and Rubin, 2nd ed., Wiley 2002.

Data augmentation was proposed in*The Calculation of Posterior Distributions by
Data Augmentation*,
M. Tanner and W. Wong, Journal of American Statistical Association,
vol. 82, no. 398, pp. 528-540, 1987.

*Lecture 12:*
MacKay's book explains Gaussian Processes in chapter 45. Ways of solving classification problems using GP:s:

Some resources for matrix derivation and Normal distributions:

*Lecture 3:* A light article on Bayes is
*The Economist. London: Jan 7, 2006. Vol. 378, Iss. 8459; p. 73*.
On a machine in hut.fi-domain, go to
The Economist

The Dutch Book Argument shows that priors must be probabilities unless one accepts irrational decisions. More details are in Notes by D. Freedman

More on Cox's Axioms can be found in K. S. Van Horn, *Constructing a
logic of plausible inference: a guide to Cox's Theorem,*
International Journal of Approximate Reasoning 34, no. 1 (Sept. 2003),
pp. 3-24.
Citeseer

*Lecture 4:* The demo illustrating overfitting avoidance is
based on a generalized linear model with Normal likelihood and a
Normal prior. Details can be found in C.K.I. Williams, *Prediction
with Gaussian Processes,* Learning in Graphical Models, ed. Michael
I. Jordan, MIT Press, 1999.

*Lecture 6:* Some background on the example illustrating
a nonuniform prior for scale data:
Benford's Law

*Lecture 8:*
Simulation methods are a very large and active topic: the lecture
only introduced the basic ideas. Introductions to MCMC include

- Basics of MCMC by Jeff Gill (Chapter 9 of his book). This is somewhat nonrigorous, but explains the ideas and some practical considerations quite well.
*Estimation and Inference via Bayesian Simulation: An Introduction to Markov Chain Monte Carlo*, S. Jackman, American Journal of Political Science, Vol. 44, no. 2, April 2000, pp. 369-398. Available from Jstor*Markov Chains for Exploring Posterior Distributions*, L. Tierney, Annals of Statistics, vol. 22, no. 4, pp. 1701-1728. Covers the theoretical ideas concisely. Available from Jstor

Knowing when a simulation has converged is a major difficulty in
simulation methods. Perfect Sampling
offers an interesting way of overcoming this issue in certain type of situations.

*Lecture 9:*
An example of applying variational approximation is found
at
Fergus et al., Removing Camera Shake From a Single Image

*Lecture 10:*
Some resources on the EM algorithm:

*Mixture Densities, Maximum Likelihood, and the EM Algorithm*, R. Redner and H. Walker, SIAM Review, 26(2), 1984.- Technical report by Jeff Bilmes

Data augmentation was proposed in

*Gaussian Processes for Classification: Mean-Field Algorithms*, M. Opper and O. Winther, Neural Computation 12, 2655-2684, 2000.*Variational Gaussian Process Classifiers*, M. Gibbs and D. MacKay, IEEE Tr Neural Networks 11, no. 6, 1458-1464, 2000.*Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification*, Radford Neal, Technical Report 9702, Dept of Statistics, U of Toronto, 1997. found here

*Lecture 13:*
*Making Hard Decisions: an Introduction to Decision Analysis*,
Robert T. Clemen, Duxbury Press, 1995, is a gentle introduction to
decision analysis.

For an example where costs do matter in model selection, see
*Stochastic
optimization methods for cost-effective quality assessment in health*,
D. Fouskakis and D. Draper, 2005. (submitted article)

Model selection is discussed in
David MacKay's book in chapter 28. Especially the use of evidence and the
resulting preference for simpler models is explained.

[ Main Page ]

You are at: CIS → T-61.5040 /

Page maintained by t615040 (at) cis.hut.fi, last updated Wednesday, 25-Apr-2007 14:03:15 EEST