Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2511

Slowness parsing SQLite database file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.16
    • 1.17
    • None
    • None

    Description

      Parsing of the attached urlclassifier3.sqlite database is approximately 5 times slower in Tika 1.16 than it was in 1.14.

      I've performed some profiling and it appears as though the problem lies in the number of times a new TikaConfig instance gets created. See attached screenshot

      I notice that EmbeddedDocumentUtil has 2 methods to get a TikaConfig, one of which caches the config instance. Are both methods needed? Why does EmbeddedDocumentUtil.getExtension() use the version that doesn't cache the config?

      Attachments

        1. urlclassifier3.sqlite
          15.00 MB
          Eamonn Saunders
        2. screenshot-1.png
          85 kB
          Eamonn Saunders

        Activity

          People

            Unassigned Unassigned
            esaunders Eamonn Saunders
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: