Title
Structural similarity between XML documents and DTDs
Abstract
The use of XML documents in the Internet continues to grow. Need for the analysis of XML documents from heterogeneous sources is arisen, in which documents would conform to different DTDs. In this paper, we propose a measure on the structural similarity among XML documents and DTDs, which is natural to understand and fast to calculate. The measure is defined as a weighted sum of the local measures of document elements with a weighting scheme based on their subtree sizes. While the local measure of an element is defined as its edit distance against its declaration, viewed as regular expression, in the DTD. Based on our definition, an algorithm for edit distance calculation between a string and a regular expression is proposed, which is modified from the algorithm applied in the regular expression matching problem. The advantage of the measure comes with its natural definition and linear complexity.
Year
DOI
Venue
2003
10.1007/3-540-44863-2_41
International Conference on Computational Science
Keywords
Field
DocType
document element,structural similarity,linear complexity,different dtds,natural definition,heterogeneous source,distance calculation,local measure,regular expression,xml document,edit distance
Edit distance,Regular expression,XML,XML validation,Computer science,Tree (data structure),Document Structure Description,Theoretical computer science,XML schema,Document Object Model
Conference
Volume
ISSN
ISBN
2659
0302-9743
3-540-40196-2
Citations 
PageRank 
References 
8
0.58
5
Authors
2
Name
Order
Citations
PageRank
Patrick K. L. Ng1141.11
Vincent T. Y. Ng2504122.85