Title
Variable Selection for Correlated High-Dimensional Data with Infrequent Categorical Variables: Based on Sparse Sample Regression and Anomaly Detection Technology
Abstract
We devised a new variable selection method to extract infrequent event variables (binary data) that have one or more interaction effects, particularly for industrial application. The method is a combination of sparse sample regression (SSR), a latest type of just-in-time modeling (JIT), and T-2 and Q statistics, which are typical anomaly detection methods. JIT is based on locally weighted regression to cope with changes in process characteristics as well as nonlinearity. In the proposed method, modeling is performed multiple times using variables whose effects on the objective variable are known. The sample to be used is selected by SSR automatically and weighted as well. By evaluating how far the estimated value of the coefficient of the created model is from the value of the reference model with T-2 and Q statistics, models that contain more (or less) samples in which events that affect the objective variable occur are detected. Variable selection is performed by ranking the ratio of events to the sample used by the model detected as an abnormal value. Synthetic data was used to verify the method. In the verification, we succeeded in extracting one of the correct answers from a total of 5000 variables including six variables that are correct effects, and the summary of the verification with actual data was shown.
Year
DOI
Venue
2021
10.1007/978-981-16-2765-1_9
INTELLIGENT DECISION TECHNOLOGIES, KES-IDT 2021
Keywords
DocType
Volume
Sparse sample regression, Just-in-time modeling, Anomaly detection, Hotelling-T-2 statistic, Q statistic, Variable selection, Interaction effect
Conference
238
ISSN
Citations 
PageRank 
2190-3018
0
0.34
References 
Authors
0
2
Name
Order
Citations
PageRank
Yuhei Kotsuka100.34
Sumika Arima200.34