Reference:

Eerika Savia. Mathematical methods for a personalized information service. Master's thesis, Helsinki University of Technology, Department of Engineering Physics and Mathematics, November 1999.

Abstract:

Automated filtering is required to manage the electronic information flood people face today. In electronic publishing it is both technically and financially feasible to offer personalized content to the customers. In SmartPush project at Helsinki University of Technology a prototype of a personalized information filtering service is developed and implemented. This thesis is a study of the mathematical methods involved in it.

A hierarchical metadata model is used in describing the documents. The representation of a user's interest is called a profile. These are used to filter and rank documents in decreasing order of interest for the users. For this purpose a semantically meaningful distance measure is introduced. The distance measure should be asymmetric in order to find documents that match individual parts of the user profile instead of the profile as a whole.

The profiles should be learned from the users' feedback automatically for which purpose a learning method was designed. The learning method is also responsible for the adaptation of the profile to changes in user's' interests and for forgetting of the past. There is a fundamental difference between the objectives of the learning method and the distance measure and it causes problems in the convergence of the suggested distance. However, a certain symmetric measure is proved to share the objective of the learning method.

Social filtering is based on the likes and dislikes of similar users, so it does not rely on descriptions of the documents. For social filtering a similariry measure between two user profiles is needed. Various such distance measures are introduced and their suitability for the task is analyzed. One measure with desirable properties is suggested. Social filtering can be used with content based profiles if slight modifications in the distance measure between the profiles are made.

The profiles can also be grouped into clusters of similar users. Here too, a distance between two profiles is needed but in this case it should be symmetric and satisfy the triangle inequality.

Testing with real user data is future work but the tests have already been designed. The intended performance scores are introduced.

Suggested BibTeX entry:

@mastersthesis{SaviaMSc99,
    author = {Eerika Savia},
    month = {November},
    school = {Helsinki University of Technology, Department of Engineering Physics and Mathematics},
    title = {Mathematical Methods for a Personalized Information Service},
    year = {1999},
}

See www.soberit.hut.fi ...