Abstract | ||
---|---|---|
We describe a novel approach for clustering col- lections of sets, and its application to the analysis and mining of categorical data. By "categorical data," we mean tables with fields that cannot be naturally ordered by a metric - e.g., the names of producers of automobiles, or the names of prod- ucts offered by a manufacturer. Our approach is based on an iterative method for assigning and propagating weights on the categorical values in a table; this facilitates a type of similarity mea- sure arising from the co-occurrence of values in the dataset. Our techniques can be studied an- alytically in terms of certain types of non-linear dynamical systems. We discuss experiments on a variety of tables of synthetic and real data; we find that our iterative methods converge quickly to prominently correlated values of various cate- gorical fields. |
Year | DOI | Venue |
---|---|---|
2000 | 10.1007/s007780050005 | The VLDB Journal — The International Journal on Very Large Data Bases |
Keywords | Field | DocType |
novel approach,propagating weight,dynamical systems,categorical data,similarity measure,non-linear dynamical system,categorical value,certain type,iterative method,clustering collection,clustering categorical data,data mining,clustering,hypergraphs,col,dynamic system,iteration method | Data mining,Similarity measure,Iterative method,Computer science,Categorical variable,Constraint graph,Dynamical systems theory,Cluster analysis,Database | Journal |
Volume | Issue | ISSN |
8 | 3-4 | 1066-8888 |
Citations | PageRank | References |
190 | 82.62 | 20 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
David Gibson | 1 | 1590 | 339.20 |
Jon Kleinberg | 2 | 22707 | 2358.90 |
Prabhakar Raghavan | 3 | 13351 | 2776.61 |