Determining the Real Data Completeness of a Relational Dataset. - Citegraph

Paper Info

Title
Determining the Real Data Completeness of a Relational Dataset.

Abstract
Low quality of data is a serious problem in the new era of big data, which can severely reduce the usability of data, mislead or bias the querying, analyzing and mining, and leads to huge loss. Incomplete data is common in low quality data, and it is necessary to determine the data completeness of a dataset to provide hints for follow-up operations on it. Little existing work focuses on the completeness of a dataset, and such work views all missing values as unknown values. In this paper, we study how to determine real data completeness of a relational dataset. By taking advantage of given functional dependencies, we aim to determine some missing attribute values by other tuples and capture the really missing attribute cells. We propose a data completeness model, formalize the problem of determining the real data completeness of a relational dataset, and give a lower bound of the time complexity of this problem. Two optimal algorithms to determine the data completeness of a dataset for different cases are proposed. We empirically show the effectiveness and the scalability of our algorithms on both real-world data and synthetic data.

Year	DOI	Venue
2016	10.1007/s11390-016-1659-x	J. Comput. Sci. Technol.
Keywords	Field	DocType
data quality, data completeness, functional dependency, data completeness model, optimal algorithm	Data mining,Data quality,Computer science,Tuple,Synthetic data,Missing data,Time complexity,Completeness (statistics),Big data,Scalability	Journal
Volume	Issue	ISSN
31	4	1860-4749
Citations	PageRank	References
1	0.35	21
Authors
3

Authors (3 rows)

Cited by (1 rows)

References (21 rows)

Name	Order	Citations	PageRank
Yongnan Liu	1	1	0.35
Jianzhong Li	2	3196	304.46
Zhaonian Zou	3	331	15.78

1