Abstract | ||
---|---|---|
User-generated content (UGC) is created, updated, and maintained by various web users, and its data quality is a major concern to all users. We observe that each Wikipedia page usually goes through a series of revision stages, gradually approaching a relatively steady quality state and that articles of different quality classes exhibit specific evolution patterns. We propose to assess the quality of a number of web articles using Learning Evolution Patterns (LEP). First, each article's revision history is mapped into a state sequence using the Hidden Markov Model (HMM). Second, evolution patterns are mined for each quality class, and each quality class is characterized by a set of quality corpora. Finally, an article's quality is determined probabilistically by comparing the article with the quality corpora. Our experimental results demonstrate that the LEP approach can capture a web article's quality precisely. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1007/978-3-642-34179-3_8 | T. Large-Scale Data- and Knowledge-Centered Systems |
Keywords | Field | DocType |
steady quality state,lep approach,quality class,probabilistically ranking web article,various web user,different quality class,quality corpus,evolution pattern,revision history,web article,data quality | User-generated content,Data mining,Data quality,Information retrieval,State sequence,Ranking,Computer science,Support vector machine,Hidden Markov model | Journal |
Volume | Issue | Citations |
6 | null | 0 |
PageRank | References | Authors |
0.34 | 22 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jingyu Han | 1 | 16 | 4.67 |
Kejia Chen | 2 | 179 | 15.82 |
Dawei Jiang | 3 | 380 | 21.67 |