Selected Data Mining Concepts - Citegraph

Paper Info

Title
Selected Data Mining Concepts

Abstract
In this multi-authored chapter, we introduce six key techniques from data mining that have been succesfully applied to epidemiological data analysis.Cluster Analysis is an unsupervised learning technique that takes large collections of data points and attempts to identify clusters of similar points. More formally, it tries to create clusters to optimize various mathematical properties, such as minimizing the maximum spread of each cluster, or minimizing the sum of the spreads. A variety of algorithms have been proposed to create clusters from a data set, including k-means, hierarchical clustering, and expectation maximization.Association Rules have become one of the central tools in transactional data mining. They can also be applied as a way of looking for correlations and associations within epidemiological data.Support Vector Machines are a popular classification method which produce linear classifiers in the form of cutting hyperplanes, which divide the space into positive and negative examples. Through the so-called "kernel-trick" it can also produce non-linear rules by projecting the data into a different space.Statistical Techniques are a bedrock of epidemiological study, through sampling, hypothesis testing, experiment design, and inference techniques. An important suite of methods draw on Bayesian inference, giving more interpretable confidence bounds and simpler model fitting.Boosting is a very influential technique for "boosting" the quality of classifier-based methods by varying the emphasis put on examples to focus the classification method on the "harder examples". The output of a number of rules is then combined by taking a weighted majority vote. The method has been extended to a wide variety of settings, and applied to a large number of different scenarios.External Memory Algorithms are needed when the volume of data being processed exceeds the internal (fast) memory of the machine, and means that some data must reside on external (slow) memory, ie disks. Such methods give deep insights into the structure and properties of the underlying algorithms.

Year	Venue	Field
2004	DISCRETE METHODS IN EPIDEMIOLOGY	Data mining,Computer science
DocType	Volume	ISSN
Conference	70	1052-1798
Citations	PageRank	References
0	0.34	0
Authors
6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
James Abello	1	699	62.19
Graham Cormode	2	3869	188.38
Dmitriy Fradkin	3	344	19.25
David Madigan	4	358	36.10
Ofer Melnik	5	55	5.91
Ilya Muchnik	6	323	47.03

1