Abstract | ||
---|---|---|
Synthetic data can be extremely useful in testing and evaluating algorithms, tools and systems. Most synthetic data generators available today are the result of individual benchmarking efforts. Typicallly, these are complex programs in which the specifications of both the structure and the contents of the data are hard-coded. As a result, it is often difficult to customize these tools for producing synthetic data tailored for specific needs. In this article, we describe the ToXgene synthetic data generator, which is a declarative tool for generating realistic XML data for benchmarking as well as testing purposes. We present our template specification language, which consists of augmenting XML Schema with probabilistic models that guide the data-generation process. We discuss the architecture of our current implementation and we argue about ToXgene's usefulness by discussing experimental results as well as describing two projects that use our tool. Copyright (C) 2006 John Wiley & Sons, Ltd. |
Year | DOI | Venue |
---|---|---|
2006 | 10.1002/spe.724 | SOFTWARE-PRACTICE & EXPERIENCE |
Keywords | DocType | Volume |
XML,synthetic data,benchmarking,probabilistic generative models | Journal | 36 |
Issue | ISSN | Citations |
10 | 0038-0644 | 3 |
PageRank | References | Authors |
0.41 | 12 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Denilson Barbosa | 1 | 610 | 43.52 |
Alberto O. Mendelzon | 2 | 4848 | 1394.98 |