Similarity-driven schema transformation for test data generation

  • A flexible and versed generation of test data is an important aspect in benchmarking algorithms for data integration. This includes the generation of heterogeneous schemas, each representing another data source of the integration benchmark. In this paper, we present our ongoing research on a novel approach for similarity-driven generation of schemas, which takes as input an arbitrary dataset, extracts its schema, and derives a set of output schemas from it. In contrast to previous solutions, we do not focus on structural transformations of relational or XML schemas, but extend the scope to contextual transformations and NoSQL data models, where the required schema information is often only implicitly defined within the data and must first be extracted. In addition, we utilize a novel method that generates multiple schemas based on user-defined heterogeneity constraints making the generation process configurable even for non-experts.

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Fabian PanseGND, Meike Klettke, Johannes Schildgen, Wolfram Wingerath
URN:urn:nbn:de:bvb:384-opus4-1172609
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/117260
ISBN:978-3-89318-085-7OPAC
ISSN:2367-2005OPAC
Parent Title (English):Proceedings of the 25th International Conference on Extending Database Technology (EDBT), 29th March - 1st April, 2022, Edinburgh, UK
Publisher:OpenProceedings
Place of publication:Konstanz
Editor:Julia Stoyanovich, Jens Teubner, Paolo Guagliardo, Milos Nikolic, Andreas Pieris, Jan Mühlig, Fatma Özcan, Sebastian Schelter, H. V. Jagadish, Meihui Zhang
Type:Conference Proceeding
Language:English
Year of first Publication:2022
Publishing Institution:Universität Augsburg
Release Date:2024/12/03
First Page:408
Last Page:413
Series:Advances in Database Technology ; 25-2
DOI:https://doi.org/10.48786/edbt.2022.31
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Data Engineering
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):CC-BY-NC-ND 4.0: Creative Commons: Namensnennung - Nicht kommerziell - Keine Bearbeitung (mit Print on Demand)