Abstract | ||
---|---|---|
It is estimated that less than 10 percent of the world's species have been described, yet species are being lost daily due to human destruction of natural habitats. The job of describing the earth's remaining species is exacerbated by the shrinking number of practicing taxonomists and the very slow pace of traditional taxonomic research. In this article, we tackle, from a novelty detection perspective, one of the most important and challenging research objectives in taxonomy - new species identification. We propose a unique and efficient novelty detection framework based on statistical depth functions. Statistical depth functions provide from the "deepest" point a "center-outward ordering" of multidimensional data. In this sense, they can detect observations that appear extreme relative to the rest of the observations, i.e., novelty. Of the various statistical depths, the spatial depth is especially appealing because of its computational efficiency and mathematical tractability. We propose a novel statistical depth, the kernelized spatial depth (KSD) that generalizes the spatial depth via positive definite kernels. By choosing a proper kernel, the KSD can capture the local structure of a data set while the spatial depth fails. Observations with depth values less than a threshold are declared as novel. The proposed algorithm is simple in structure: the threshold is the only one parameter for a given kernel. We give an upper bound on the false alarm probability of a depth-based detector, which can be used to determine the threshold. Experimental study demonstrates its excellent potential in new species discovery. |
Year | DOI | Venue |
---|---|---|
2007 | 10.1109/ICDM.2007.10 | ICDM |
Keywords | Field | DocType |
depth-based novelty detection,statistical depth function,remaining species,depth-based detector,false alarm probability,taxonomic research,kernelized spatial depth,multidimensional data,learning (artificial intelligence),novel statistical depth,statistical depth functions,efficient novelty detection framework,mathematical tractability,taxonomy new species identification,various statistical depth,biology computing,new species identification,new species discovery,novelty detection perspective,data mining,machine learning,spatial depth,zoology,center-outward ordering,probability,upper bound,learning artificial intelligence,data gathering,body shape,geographic range,positive definite kernel | Data mining,Novelty detection,False alarm,Upper and lower bounds,Computer science,Artificial intelligence,Detector,Kernel (linear algebra),Pattern recognition,Positive-definite matrix,Local structure,Novelty,Machine learning | Conference |
ISSN | ISBN | Citations |
1550-4786 | 978-0-7695-3018-5 | 1 |
PageRank | References | Authors |
0.38 | 15 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yixin Chen | 1 | 4326 | 299.19 |
Henry L. Bart Jr. | 2 | 6 | 1.54 |
Xin Dang | 3 | 139 | 9.85 |
Hanxiang Peng | 4 | 52 | 2.62 |