Principles of Data Mining
Table of Contents

Also available as a PostScript file for printing.

Chapter 1: Introduction

1.1Introduction to Data Mining
1.2The Nature of Data Sets
1.3Types of Structure: Models and Patterns
1.4Data Mining Tasks
1.5Components of Data Mining Algorithms
1.6The Interacting Roles of Statistics and Data Mining
1.7Data Mining: Dredging, Snooping, and Fishing
1.8Summary
1.9Further Reading

Chapter 2: Measurement and Data

2.1Introduction
2.2Types of Measurement
2.3Distance Measures
2.4Transforming Data
2.5The Form of Data
2.6Data Quality for Individual Measurements
2.7Data Quality for Collections of Data
2.8Conclusion
2.9Further Reading

Chapter 3: Visualizing and Exploring Data

3.1Introduction
3.2Summarizing Data: Some Simple Examples
3.3Tools for Displaying Single Variables
3.4Tools for Displaying Relationships Between Two Variables
3.5Tools for Displaying More Than Two Variables
3.6Principal Components Analysis
3.7Multidimensional Scaling
3.8Further Reading

Chapter 4: Data Analysis and Uncertainty

4.1Introduction
4.2Dealing with Uncertainty
4.3Random Variables and their Relationships
4.4Samples and Statistical Inference
4.5Estimation
4.6Hypothesis Testing
4.7Sampling Methods
4.8Conclusion
4.9Further Reading

Chapter 5: A Systematic Overview of Data Mining Algorithms

5.1Introduction
5.2An Example: The CART Algorithm for Building Tree Classifiers
5.3The Reductionist Viewpoint on Data Mining Algorithms
5.3.1Multilayer Perceptrons for Regression and Classification
5.3.2The A Priori Algorithm for Association Rule Learning
5.3.3Vector-Space Algorithms for Text Retrieval
5.4Discussion
5.5Further Reading
5.6Bibliography

Chapter 6: Models and Patterns

6.1Introduction
6.2Fundamentals of Modeling
6.3Model Structures for Prediction
6.3.1Regression Models with Linear Structure
6.3.2Local Piecewise Model Structures for Regression
6.3.3Nonparametric "Memory-Based" Local Models
6.3.4Stochastic Components of Model Structures
6.3.5Predictive Models for Classification
6.3.6An Aside: Selecting a Model of Appropriate Complexity
6.4Models for Probability Distributions and Density Functions
6.4.1General Concepts
6.4.2Mixtures of Parametric Models
6.4.3Joint Distributions for Unordered Categorical Data
6.4.4Factorization and Independence in High Dimensions
6.5The Curse of Dimensionality
6.5.1Variable Selection for High-Dimensional Data
6.5.2Transformations for High-Dimensional Data
6.6Models for Structured Data
6.7Pattern Structures
6.7.1Patterns in Data Matrices
6.7.2Patterns for Strings
6.8Further Reading

Chapter 7: Score Functions for Data Mining Algorithms

7.1Introduction
7.2Scoring Patterns
7.3Predictive versus Descriptive Score Functions
7.3.1Score Functions for Predictive Models
7.3.2Score Functions for Descriptive Models
7.4Scoring Models with Different Complexities
7.4.1General Concepts in Comparing Models
7.4.2Bias-Variance Again
7.4.3Score Functions That Penalize Complexity
7.4.4Score Functions Using External Validation
7.5Evaluation of Models and Patterns
7.6Robust Methods
7.7Further Reading
7.8Bibliography

Chapter 8: Search and Optimization Methods

8.1Introduction
8.2Searching for Models and Patterns
8.2.1Background on Search
8.2.2The State-Space Formulation for Search in Data Mining
8.2.3A Simple Greedy Search Algorithm
8.2.4Systematic Search and Search Heuristics
8.2.5Branch-and-Bound
8.3Parameter Optimization Methods
8.3.1Parameter Optimization: Background
8.3.2Closed Form and Linear Algebra Methods
8.3.3Gradient-Based Methods for Optimizing Smooth Functions
8.3.4Univariate Parameter Optimization
8.3.5Multivariate Parameter Optimization
8.3.6Constrained Optimization
8.4Optimization with Missing Data: The EM Algorithm
8.5Online and Single-Scan Algorithms
8.6Stochastic Search and Optimization Techniques
8.7Further Reading

Chapter 9: Descriptive Modeling

9.1Introduction
9.2Describing Data by Probability Distributions and Densities
9.2.1Introduction
9.2.2Score Functions for Estimating Probability Distributions and Densities
9.2.3Parametric Density Models
9.2.4Mixture Distributions and Densities
9.2.5The EM Algorithm for Mixture Models
9.2.6Nonparametric Density Estimation
9.2.7Joint Distributions for Categorical Data
9.3Background on Cluster Analysis
9.4Partition-Based Clustering Algorithms
9.4.1Score Functions for Partition-Based Clustering
9.4.2Basic Algorithms for Partition-Based Clustering
9.5Hierarchical Clustering
9.5.1Agglomerative Methods
9.5.2Divisive Methods
9.6Probabilistic Model-Based Clustering Using Mixture Models
9.7Further Reading

Chapter 10: Predictive Modeling for Classification

10.1A Brief Overview of Predictive Modeling
10.2Introduction to Classification Modeling
10.2.1Discriminative Classification and Decision Boundaries
10.2.2Probabilistic Models for Classification
10.2.3Building Real Classifiers
10.3The Perceptron
10.4Linear Discriminants
10.5Tree Models
10.6Nearest Neighbor Methods
10.7Logistic Discriminant Analysis
10.8The Naive Bayes Model
10.9Other Methods
10.10Evaluating and Comparing Classifiers
10.11Feature Selection for Classification in High Dimensions
10.12Further Reading

Chapter 11: Predictive Modeling for Regression

11.1Introduction
11.2Linear Models and Least Squares Fitting
11.2.1Computational Issues in Fitting the Model
11.2.2A Probabilistic Interpretation of Linear Regression
11.2.3Interpreting the Fitted Model
11.2.4Inference and Generalization
11.2.5Model Search and Model Building
11.2.6Diagnostics and Model Inspection
11.3Generalized Linear Models
11.4Artificial Neural Networks
11.5Other Highly Parameterized Models
11.5.1Generalized Additive Models
11.5.2Projection Pursuit Regression
11.6Further Reading

Chapter 12: Data Organization and Databases

12.1Introduction
12.2Memory Hierarchy
12.3Index Structures
12.3.1B-trees
12.3.2Hash Indices
12.4Multidimensional Indexing
12.5Relational Databases
12.6Relational Algebra
12.7The Structured Query Language (SQL)
12.8Query Execution and Optimization
12.9Data Warehousing and On-Line Analytical Processing (OLAP)
12.10Data Structures for OLAP
12.11String Databases
12.12Very Large Data Sets, Data Management, and Data Mining
12.12.1Force the Data into Main Memory
12.12.2Scalable Versions of Data Mining Algorithms
12.12.3Special-Purpose Algorithms for Disk Access
12.12.4Pseudo Data Sets and Sufficient Statistics
12.13Further Reading

Chapter 13: Finding Patterns and Rules

13.1Introduction
13.2Rule Representations
13.3Frequent Itemsets and Association Rules
13.3.1Introduction
13.3.2Finding Frequent Sets and Association Rules
13.4Generalizations
13.5Finding Episodes from Sequences
13.6Selective Discovery of Patterns and Rules
13.6.1Introduction
13.6.2Heuristic Search for Finding Patterns
13.6.3Criteria for Interestingness
13.7From Local Patterns to Global Models
13.8Predictive Rule Induction
13.9Further Reading

Chapter 14: Retrieval by Content

14.1Introduction
14.2Evaluation of Retrieval Systems
14.2.1The Difficulty of Evaluating Retrieval Performance
14.2.2Precision versus Recall
14.2.3Precision and Recall in Practice
14.3Text Retrieval
14.3.1Representation of Text
14.3.2Matching Queries and Documents
14.3.3Latent Semantic Indexing
14.3.4Relevance Feedback
14.4Automated Recommender Systems
14.5Document and Text Classification
14.6Image Retrieval
14.6.1Image Understanding
14.6.2Image Representation
14.6.3Image Queries
14.6.4Image Invariants
14.6.5Generalizations of Image Retrieval
14.7Time Series and Sequence Retrieval
14.7.1Global Models for Time Series Data
14.7.2Structure and Shape in Time Series
14.8Summary
14.9Further Reading

Course main page



http://www.cis.hut.fi/Opinnot/T-61.6060/k2001/contents.shtml
jkseppan@mail.cis.hut.fi
Monday, 07-Jan-2002 15:04:14 EET