Abstract | ||
---|---|---|
We introduce the idea of Data Readiness Level (DRL) to measure the relative richness of data to answer specific questions often encountered by data scientists. We first approach the problem in its full generality explaining its desired mathematical properties and applications and then we propose and study two DRL metrics. Specifically, we define DRL as a function of at least four properties of data: Noisiness, Believability, Relevance, and Coherence. The information-theoretic based metrics, Cosine Similarity and Document Disparity, are proposed as indicators of Relevance and Coherence for a piece of data. The proposed metrics are validated through a text-based experiment using Twitter data. |
Year | Venue | Field |
---|---|---|
2017 | arXiv: Information Retrieval | Data mining,Information retrieval,Cosine similarity,Computer science,Coherence (physics),Mathematical properties,Generality |
DocType | Volume | Citations |
Journal | abs/1702.02107 | 0 |
PageRank | References | Authors |
0.34 | 3 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hui Guan | 1 | 1 | 2.04 |
Thanos Gentimis | 2 | 26 | 2.90 |
Hamid Krim | 3 | 520 | 59.69 |
James Keiser | 4 | 0 | 0.34 |