T-61.2010 Course assignment: #1 "Eigenfaces"

Using PCA to a collection of human face images.

Contents


Student

Student: Joe Doe

Student ID: 12345A

Email address:

Preliminaries

Read about PCA and dimension reduction, look through "paper exercises" 1 and 2 from 2nd round, check how to compute with matrices and how to find eigenvalues, get familiar to Matlab.

Introduction

The target of this assignment is to demonstrate use of PCA in case of human faces.

For the assignment, a personalised Matlab data file, consisting of a set of gray scale images, can be accessed through the WWW pages of the course. Mail Ville.Viitaniemi at tkk.fi of you can not find a data set with your student number.

Your task is to:

Keywords are "PCA - Principal Component Analysis" and "eigenfaces".

The code and documentation are to be written individually using the personal datasets. However, discussion in small groups or in news group is very welcome. Plagiarism is prohibited (CS department instructions regarding dealing with plagiarism apply.)

Documentation

You have to return a document and the code as attachments of email by 15.1.2008 to vvi@cis.hut.fi. Use subject: "T-61.2010 assignment #1 STUD_ID", where STUD_ID is replaced by your own student ID.

The document must contain the name, student ID and email address of the student, small description the phases of the assignment, your results and answers to the questions (8). Convert your document into PDF.

Copy your Matlab code into a file and include it as an email attachment.

Some Matlab functions

Some hints can be found from computer sessions, especially round #2.

There are some functions to be used:

For matrices:

In this work the data matrix consists of gray level images. Matlab Image Processing Toolbox (IPT) contains some useful functions ("doc images"):

Image database

This assignment deals with human faces. There can be several images from each person. They are positioned, all images are 19x19 pixels.

Fetch your own personal image file XXXXXY.mat (Topic #1) through http://www.cis.hut.fi/Opinnot/T-61.2010/Harjoitustyo/.

The database CBCL is from MIT http://cbcl.mit.edu/cbcl/software-datasets/FaceData2.html

Copyright 2000. Center for Biological and Computational Learning at MIT and MIT. All rights reserved.

Permission to copy and modify this data, software, and its documentation only for internal research use in your organization is hereby granted, provided that this notice is retained thereon and on all copies. This data and software should not be distributed to anyone outside of your organization without explicit written authorization by the author(s) and MIT. It should not be used for commercial purposes without specific permission from the authors and MIT. MIT also requires written authorization by the author(s) to publish results obtained with the data or software and possibly citation of relevant CBCL reference papers.

We make no representation as to the suitability and operability of this data or software for any purpose. It is provided "as is" without express or implied warranty.

Example run

Variables used in the instructions below. The only variable you get from your personal data file is K.


(1) Reading face images to Matlab

Fetch your own file XXXXXY.mat (Topice #1) through URL http://www.cis.hut.fi/Opinnot/T-61.2010/Harjoitustyo/. (NOTE! Mail Ville.Viitaniemi () tkk.fi if you can not find data with your student number.

Read the file with load. Now you have n images in Matlab image matrix K. Images are gray scale, of same size (r rows and c columns). Using montage (or myMontage) you can draw all faces at the same time.

load([opnro '_train.mat']);     % Tähän oma <opnro>, sisältää matriisin K

r        = size(K,1);
c        = size(K,2);
n        = size(K,4);
datatype = class(K);
disp(['Kuvia on ' num2str(n) ' kappaletta']);
disp(['Kunkin kuvan koko on (' num2str(r) ' x ' num2str(c) ') ja tyyppi ' datatype]);

myMontage(K, 'Alkuperäiset kuvat', 1);


Kuvia on 92 kappaletta
Kunkin kuvan koko on (19 x 19) ja tyyppi uint8

(2) Constructing data matrix

Modify (Cast) each image into double type and read each image into a column vector. Now you will have a matrix X (D) with d=rc rows and n columns.

Kuva- ja datamatriisi

(3) Principle component analysis (PCA)

Compute PCA. Remove the mean (size d x 1):

Substract the mean from the matrix. Then compute the covariance matrix (size d x d):

Finally, compute eigenvalues and eigenvectors from C_x using command eig. (Eigenvectors should be sorted according to eigenvalues.)

(4) Eigenfaces of dataset

Eigenfaces are now eigenvectors of C_x (columns). :n ominaisvektorit, jotka ovat matriisissa, jonka koko on (d x d). In order to draw eigenfaces with montage, one has to

(5) Projecting images

Project data points (faces) into a 2D-space spanned by two largest eigenvectors. The images looking similar should map close to each other.

Pick two eigenvectors whose eigenvalues are two largest. Let W contain these vectors (d x 2), and the projection is done using:

Y is (2 x n). Plot these n points in xy-space using plot(x,y,'x'), see "help plot". You can add text (numbers) with command text.

(6) Compression - projection to m-dimensional space

In this example the cumulative sum of eigenvalues is computed. The error J when leaving eigenvectors m..d out:

In this way you can choose a correct number of eigenvectors to be saved

1..d | index number | cum. sum %
    1.0000    1.0000   54.9890
    2.0000    2.0000   65.0714
    3.0000    3.0000   71.6040
    4.0000    4.0000   76.1306
    5.0000    5.0000   79.4855
    6.0000    6.0000   82.1150
    7.0000    7.0000   84.4082
    8.0000    8.0000   85.9933
    9.0000    9.0000   87.2604
   10.0000   10.0000   88.4974
   11.0000   11.0000   89.5934
   12.0000   12.0000   90.5740

Compress data so that m first eigenvectors e_i are taken into account. Choose m so that 90 pro cent of variation (energy) is taken.

Valitaan m=12 ominaisvektoria, jotta saavutetaan 90%

In the compression the original figure is represented with vector p, whose dimension is only (m x 1). p expresses how great amount each eigenface is from total image. The total compression takes the matrix (m x n) and eigenvectors (d x m)

(7) Decompression - back to d-dimensional space

Decompress n vectors p (m values) back to images x_hat (each figure d = r x c pixels).

Modify the matrix and draw.

(8) Questions

Think through and answer to the following questions: