Title
Enhancing the Stability of Spectral Ordering with Sparsification and Partial Supervision: Application to Paleontological Data
Abstract
Recent studies have demonstrated the prospects of data mining algorithms for addressing the task of seriation in paleontological data (i.e. the age-based ordering of the sites of excavation). A prominent approach is spectral ordering that computes a similarity measure between the sites and orders them such that similar sites become adjacent and dissimilar sites are placed far apart. In the paleontological domain, the similarity measure is based on the mammal genera whose remains are retrieved at each site of excavation. Although spectral ordering achieves good performance in the seriation task, it ignores the background knowledge that is naturally present in the domain, as paleontologists can derive the ages of the sites of excavation within some accuracy. On the other hand, the age information is uncertain, so the best approach would be to combine the background knowledge with the information on mammal co-occurrences. Motivated by this kind of partial supervision we propose a novel semi-supervised spectral ordering algorithm. Our algorithm modifies the Laplacian matrix used in spectral ordering, such that domain knowledge of the ordering is taken into account. Also, it performs feature selection (sparsification) by discarding features that contribute most to the unwanted variability of the data in bootstrap sampling. The theoretical properties of the proposed algorithm are thoroughly analyzed and it is demonstrated that the proposed framework enhances the stability of the spectral ordering output and induces computational gains.
Year
DOI
Venue
2008
10.1109/ICDM.2008.120
ICDM
Keywords
Field
DocType
paleontological domain,best approach,proposed algorithm,data mining algorithm,partial supervision,paleontological data,similarity measure,domain knowledge,mammal co-occurrences,age information,spectral ordering,laplacian,bootstrapping,palaeontology,data mining,laplacian matrix,feature selection,stability analysis,upper bound
Data mining,Similarity measure,Feature selection,Computer science,Bootstrapping,Bootstrapping (statistics),Artificial intelligence,Laplacian matrix,Domain knowledge,Pattern recognition,Eigengap,Machine learning,Seriation (archaeology)
Conference
ISSN
Citations 
PageRank 
1550-4786
4
0.46
References 
Authors
7
2
Name
Order
Citations
PageRank
Dimitrios Mavroeidis11309.50
Ella Bingham291758.70