Title
Survey on establishing the optimal number of factors in exploratory factor analysis applied to data mining
Abstract
In many types of researches and studies including those performed by the sciences of agriculture and plant sciences, large quantities of data are frequently obtained that must be analyzed using different data mining techniques. Sometimes data mining involves the application of different methods of statistical data analysis. Exploratory Factor Analysis (EFA) is frequently used as a technique for data reduction and structure detection in data mining. In our survey, we study the EFA applied to data mining, focusing on the problem of establishing of the optimal number of factors to be retained. The number of factors to retain is the most important decision to take after the factor extraction in EFA. Many researchers discussed the criteria for choosing the optimal number of factors. Mistakes in factor extraction may consist in extracting too few or too many factors. An inappropriate number of factors may lead to erroneous conclusions. A comprehensive review of the state-of-the-art related to this subject was made. The main focus was on the most frequently applied factor selection methods, namely Kaiser Criterion, Cattell's Scree test, and Monte Carlo Parallel Analysis. We have highligthed the importance of the analysis in some research, based on the research specificity, of the total cumulative variance explained by the selected optimal number of extracted factors. It is necessary that the extracted factors explain at least a minimum threshold of cumulative variance. ExtrOptFact algorithm presents the steps that must be performed in EFA for the selection of the optimal number of factors. For validation purposes, a case study was presented, performed on data obtained in an experimental study that we made on Brassica napus plant. Applying the ExtrOptFact algorithm for Principal Component Analysis can be decided on the selection of three components that were called Qualitative, Generative, and Vegetative, which explained 92% of the total cumulative variance.
Year
DOI
Venue
2019
10.1002/widm.1294
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY
Keywords
Field
DocType
data reduction,exploratory factor analysis (EFA),establishing the number of extracted factors in EFA,researches performed on complex biological systems,structure detection,statistical methods in data mining
Data mining,Computer science,Exploratory factor analysis,Artificial intelligence,Machine learning,Data reduction
Journal
Volume
Issue
ISSN
9.0
2.0
1942-4787
Citations 
PageRank 
References 
0
0.34
12
Authors
3
Name
Order
Citations
PageRank
Barna Laszlo Iantovics1198.47
Corina Rotar241.09
Florica Morar300.34