Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-514

Provide constructor for AutoDetectParser that has explicit list of supported parsers

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.7
    • 0.8
    • None
    • None

    Description

      To reduce the size of the Tika dependency chain, it's useful to exclude the supporting jars for types that don't need to process (e.g. Microsoft docs, PDFs, etc). This can easily remove 20MB of 3rd party jars.

      With 0.8-SNAPSHOT, the TikaConfig(Classpath) constructor now finds and instantiates all Parser-based classes found on the classpath. Which can trigger errors when 3rd party jars are missing.

      One solution, as proposed by Jukka, is to provide an alternative constructor for AutoDetectParser which includes the list of supported parsers, and avoids creating the default TikaConfig.

      Attachments

        1. TIKA-514.patch
          7 kB
          Kenneth William Krugler

        Activity

          People

            kkrugler Kenneth William Krugler
            kkrugler Kenneth William Krugler
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: