Uploaded image for project: 'Droids'
  1. Droids
  2. DROIDS-81

Create a document parser that doesn't HTMLify the results.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.1.0
    • None
    • tika
    • None

    Description

      While the TikaHTMLParser can parse pdfs, docs, etc, it returns them in an HTMLified format. Solr blows up on that format, and it isn't always necessary to do this step anyway.

      Attachments

        1. tika-document-parser.patch
          3 kB
          Richard Frovarp

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rfrovarp Richard Frovarp
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: