Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2603

Bring back legacy pre-Tika parsers and use them as back up parsers

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 1.15
    • None
    • parser
    • None

    Description

      There are cases when legacy parsers successfully parse documents on which Tika fails. I am attaching a list of examples of such documents. Nutch allows use of more than one parser on a document, in a sequence, until the document has been parsed successfully. Thus, old parsers can be combined with Tika to achieve better parsing success rate, at least until Tika is perfect.

      Attachments

        1. public_docs.txt
          133 kB
          Arkadi Kosmynin

        Activity

          People

            Unassigned Unassigned
            ArkadiKosmynin Arkadi Kosmynin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: