Abstract | ||
---|---|---|
The use of XML documents in the Internet continues to grow. Need for the analysis of XML documents from heterogeneous sources is arisen, in which documents would conform to different DTDs. In this paper, we propose a measure on the structural similarity among XML documents and DTDs, which is natural to understand and fast to calculate. The measure is defined as a weighted sum of the local measures of document elements with a weighting scheme based on their subtree sizes. While the local measure of an element is defined as its edit distance against its declaration, viewed as regular expression, in the DTD. Based on our definition, an algorithm for edit distance calculation between a string and a regular expression is proposed, which is modified from the algorithm applied in the regular expression matching problem. The advantage of the measure comes with its natural definition and linear complexity. |
Year | DOI | Venue |
---|---|---|
2003 | 10.1007/3-540-44863-2_41 | International Conference on Computational Science |
Keywords | Field | DocType |
document element,structural similarity,linear complexity,different dtds,natural definition,heterogeneous source,distance calculation,local measure,regular expression,xml document,edit distance | Edit distance,Regular expression,XML,XML validation,Computer science,Tree (data structure),Document Structure Description,Theoretical computer science,XML schema,Document Object Model | Conference |
Volume | ISSN | ISBN |
2659 | 0302-9743 | 3-540-40196-2 |
Citations | PageRank | References |
8 | 0.58 | 5 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Patrick K. L. Ng | 1 | 14 | 1.11 |
Vincent T. Y. Ng | 2 | 504 | 122.85 |