Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2276

Try to be more parsimonious creating TikaConfigs and ParseContexts

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0, 1.15
    • Component/s: None
    • Labels:
      None

      Description

      If we run the AutoDetectParser() against the files in our unit tests (around 600 files*), there are 701 new instantiations of TikaConfig. The time is around 20 seconds. If we modify AutoDetectParser to pass its TikaConfig via the ParseContext if one isn't already specified, that drops to 234 instantiations, and parse time goes to ~17 seconds.

      Let's make this simple change and look for other areas to decrease the number of times our parsers are creating a new TikaConfig.

      *Note I did not include the testCHM2.chm monster in these runs.

        Attachments

          Activity

            People

            • Assignee:
              tallison Tim Allison
              Reporter:
              tallison Tim Allison
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: