Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-980

MicrodataContentHandler for Apache Tika

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • 1.17, 2.0.0-BETA, 2.1.0
    • parser
    • None

    Description

      ContentHandler for Apache Tika capable of building a data structure containing Microdata item scopes and item properties. The Item* classes are borrowed from the Apache Any23 project and are slightly modified to accomodate this SAX-based extractor vs the original DOM-based extractor.

      The provided unit test outputs two item scopes about the Europe and NA ApacheCon events and each has a nested property.

      Attachments

        1. TIKA-980-1.3-1.patch
          42 kB
          Markus Jelsma
        2. TIKA-980-1.3-2.patch
          48 kB
          Markus Jelsma
        3. TIKA-980-1.3-3.patch
          49 kB
          Markus Jelsma
        4. TIKA-980-1.3-4.patch
          42 kB
          Markus Jelsma
        5. TIKA-980-1.3-5.patch
          44 kB
          Markus Jelsma

        Issue Links

          Activity

            People

              kkrugler Kenneth William Krugler
              markus17 Markus Jelsma
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: