Title
Selected Data Mining Concepts
Abstract
In this multi-authored chapter, we introduce six key techniques from data mining that have been succesfully applied to epidemiological data analysis.Cluster Analysis is an unsupervised learning technique that takes large collections of data points and attempts to identify clusters of similar points. More formally, it tries to create clusters to optimize various mathematical properties, such as minimizing the maximum spread of each cluster, or minimizing the sum of the spreads. A variety of algorithms have been proposed to create clusters from a data set, including k-means, hierarchical clustering, and expectation maximization.Association Rules have become one of the central tools in transactional data mining. They can also be applied as a way of looking for correlations and associations within epidemiological data.Support Vector Machines are a popular classification method which produce linear classifiers in the form of cutting hyperplanes, which divide the space into positive and negative examples. Through the so-called "kernel-trick" it can also produce non-linear rules by projecting the data into a different space.Statistical Techniques are a bedrock of epidemiological study, through sampling, hypothesis testing, experiment design, and inference techniques. An important suite of methods draw on Bayesian inference, giving more interpretable confidence bounds and simpler model fitting.Boosting is a very influential technique for "boosting" the quality of classifier-based methods by varying the emphasis put on examples to focus the classification method on the "harder examples". The output of a number of rules is then combined by taking a weighted majority vote. The method has been extended to a wide variety of settings, and applied to a large number of different scenarios.External Memory Algorithms are needed when the volume of data being processed exceeds the internal (fast) memory of the machine, and means that some data must reside on external (slow) memory, ie disks. Such methods give deep insights into the structure and properties of the underlying algorithms.
Year
Venue
Field
2004
DISCRETE METHODS IN EPIDEMIOLOGY
Data mining,Computer science
DocType
Volume
ISSN
Conference
70
1052-1798
Citations 
PageRank 
References 
0
0.34
0
Authors
6
Name
Order
Citations
PageRank
James Abello169962.19
Graham Cormode23869188.38
Dmitriy Fradkin334419.25
David Madigan435836.10
Ofer Melnik5555.91
Ilya Muchnik632347.03