Title
A classification framework for multivariate compositional data with Dirichlet feature embedding
Abstract
Compositional data which contain relative or structure information of a whole occur commonly in many disciplines and practical scenarios. Yet relatively few works are available for multivariate compositional data classification with different numbers of parts using machine learning. This is because compositional data is inherently constrained to unit sum, resulting in the existing methods cannot be directly applied. Particularly, the multivariate analysis methods for compositional data variables with unequal sizes of parts are not sufficiently investigated. Moreover, to design a good classification model is indeed a complicated work. Except for the learning algorithm, data quality is also an essential determinant, which is rarely been concerned. In this paper, we propose an effective framework for multivariate compositional data classification. Specifically, the Dirichlet feature embedding is proposed to implement on the original compositional data features with the goal of removing the constraint and obtaining high quality training data, as well as reducing the dimension. Support vector machine is then used to build the classification model. Results of simulation study and real-world dataset show our proposed method can achieve good performances.
Year
DOI
Venue
2021
10.1016/j.knosys.2020.106614
Knowledge-Based Systems
Keywords
DocType
Volume
Multivariate compositional data,Classification,Feature embedding,Dirichlet distribution,Support vector machine
Journal
212
ISSN
Citations 
PageRank 
0950-7051
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Jie Gu195.59
Bin Cui21843124.59
Shan Lu321.04