Abstract | ||
---|---|---|
Many people who have to make informed decisions in today’s always-on culture use information extractors to feed their systems with information that comes from human-friendly documents. Unfortunately, many proposals that validate information extractors have deficiencies that make it difficult to perform homogeneous comparisons, confirm or refute performance hypotheses, or draw unbiased conclusions. Consequently, it is very difficult to select the best-performing proposal on a sound basis. The state-of-the-art validation method overcomes many deficiencies in the previous proposals, but still overlooks the following issues: completeness of the validation datasets, that is, whether they provide a complete set of annotations or not; structure of the information, that is, whether they check the structure of the record instances extracted or just the attribute instances; and, finally, how extractions and annotations are matched. The decisions made regarding the previous issues have an impact on the effectiveness results. In this article, we have exhaustively analysed the literature and we have also highlighted the main weaknesses to tackle. We present a guideline and a method to compute the effectiveness, which complements and enhances the state-of-the-art validation method. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1016/j.eswa.2022.116700 | Expert Systems with Applications |
Keywords | DocType | Volume |
Web information extractors,Validation method | Journal | 199 |
ISSN | Citations | PageRank |
0957-4174 | 0 | 0.34 |
References | Authors | |
0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Patricia Jimenez | 1 | 0 | 0.34 |
Rafael Corchuelo | 2 | 389 | 49.87 |