Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2511

Slowness parsing SQLite database file

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.16
    • Fix Version/s: 1.17
    • Component/s: None
    • Labels:
      None

      Description

      Parsing of the attached urlclassifier3.sqlite database is approximately 5 times slower in Tika 1.16 than it was in 1.14.

      I've performed some profiling and it appears as though the problem lies in the number of times a new TikaConfig instance gets created. See attached screenshot

      I notice that EmbeddedDocumentUtil has 2 methods to get a TikaConfig, one of which caches the config instance. Are both methods needed? Why does EmbeddedDocumentUtil.getExtension() use the version that doesn't cache the config?

        Attachments

        1. urlclassifier3.sqlite
          15.00 MB
          Eamonn Saunders
        2. screenshot-1.png
          85 kB
          Eamonn Saunders

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              esaunders Eamonn Saunders
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: