Skip to main navigation Skip to search Skip to main content

Finding optimal probabilistic generators for XML collections

  • Serge Abiteboul
  • , Yael Amsterdamer
  • , Daniel Deutch
  • , Tova Milo
  • , Pierre Senellart

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    3 Scopus citations

    Abstract

    We study the problem of, given a corpus of XML documents and its schema, finding an optimal (generative) probabilistic model, where optimality here means maximizing the likelihood of the particular corpus to be generated. Focusing first on the structure of documents, we present an efficient algorithm for finding the best generative probabilistic model, in the absence of constraints. We further study the problem in the presence of integrity constraints, namely key, inclusion, and domain constraints. We study in this case two different kinds of generators. First, we consider a continuation-test generator that performs, while generating documents, tests of schema satisfiability; these tests prevent from generating a document violating the constraints but, as we will see, they are computationally expensive. We also study a restart generator that may generate an invalid document and, when this is the case, restarts and tries again. Finally, we consider the injection of data values into the structure, to obtain a full XML document. We study different approaches for generating these values.

    Original languageEnglish
    Title of host publicationDatabase Theory - ICDT 2012
    Subtitle of host publication15th International Conference on Database Technology, Proceedings
    PublisherAssociation for Computing Machinery
    Pages127-139
    Number of pages13
    ISBN (Print)9781450307918
    DOIs
    StatePublished - 26 Mar 2012
    Event15th International Conference on Database Theory, ICDT 2012 - Berlin, Germany
    Duration: 26 Mar 201229 Mar 2012

    Publication series

    NameACM International Conference Proceeding Series

    Conference

    Conference15th International Conference on Database Theory, ICDT 2012
    Country/TerritoryGermany
    CityBerlin
    Period26/03/1229/03/12

    Keywords

    • Constraints
    • Generator
    • Probabilistic model
    • Schema
    • XML

    ASJC Scopus subject areas

    • Software
    • Human-Computer Interaction
    • Computer Vision and Pattern Recognition
    • Computer Networks and Communications

    Fingerprint

    Dive into the research topics of 'Finding optimal probabilistic generators for XML collections'. Together they form a unique fingerprint.

    Cite this