Title | ||
---|---|---|
Computing Mutual Information Of Big Categorical Data And Its Application To Feature Grouping |
Abstract | ||
---|---|---|
This paper develops a parallel computing system - MiCS - for mutual information of big categorical data on the Spark computing platform. The MiCS algorithm is conductive to processing a large amount and strong repeatability of mutual-information calculation among feature pairs by applying a column-wise transformation scheme. And to improve the efficiency of the MiCS and the utilization rate of Spark cluster resources, we adopt a virtual partitioning scheme to achieve balanced load while mitigating the data skewness problem in the Spark Shuffle process. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/ICDE48307.2020.00210 | 2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020) |
Keywords | DocType | ISSN |
Parallel Mutual-information Computation, Feature Grouping, Data Skewness, Big Categorical Data, Spark | Conference | 1084-4627 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Junli Li | 1 | 2 | 1.70 |
Chaowei Zhang | 2 | 0 | 1.69 |
Jifu Zhang | 3 | 95 | 19.42 |
Xiao Qin | 4 | 1836 | 125.69 |