Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-980

MicrodataContentHandler for Apache Tika

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 2.0, 1.17
    • Component/s: parser
    • Labels:
      None

      Description

      ContentHandler for Apache Tika capable of building a data structure containing Microdata item scopes and item properties. The Item* classes are borrowed from the Apache Any23 project and are slightly modified to accomodate this SAX-based extractor vs the original DOM-based extractor.

      The provided unit test outputs two item scopes about the Europe and NA ApacheCon events and each has a nested property.

        Attachments

        1. TIKA-980-1.3-5.patch
          44 kB
          Markus Jelsma
        2. TIKA-980-1.3-4.patch
          42 kB
          Markus Jelsma
        3. TIKA-980-1.3-3.patch
          49 kB
          Markus Jelsma
        4. TIKA-980-1.3-2.patch
          48 kB
          Markus Jelsma
        5. TIKA-980-1.3-1.patch
          42 kB
          Markus Jelsma

          Issue Links

            Activity

              People

              • Assignee:
                kkrugler Kenneth William Krugler
                Reporter:
                markus17 Markus Jelsma
              • Votes:
                2 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated: