Uploaded image for project: 'Commons Compress'
  1. Commons Compress
  2. COMPRESS-285

checking of availability of XZ compression is expensive - result should be reused

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.5, 1.6, 1.7, 1.8
    • Fix Version/s: 1.9
    • Component/s: Compressors
    • Labels:
    • Environment:

      linux 64-bit, java 7, glassfish, solr, tika

      Description

      I use solr with apache tika for indexing documents. Tika uses commons-compress to handle compressed files. Using sampler (jvisualvm) I have seen that quite a lot of time (5-7%) during my tests is spent in XZUtils.isXZCompressionAvailable because of unavailable XZ compression (I guess for each time classloaders spend some time looking for unavailable classes, then NoClassDefFoundError).

      I think the result of the first check should be stored and reused.

      Here is the stacktrace (just to show the way tika is using commons-compress):
      org.apache.commons.compress.compressors.xz.XZUtils.isXZCompressionAvailable(XZUtils.java:52)
      at org.apache.commons.compress.compressors.CompressorStreamFactory.createCompressorInputStream(CompressorStreamFactory.java:140)
      at org.apache.tika.parser.pkg.ZipContainerDetector.detectCompressorFormat(ZipContainerDetector.java:95)
      at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:81)
      at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              lsx Wojciech Ɓozowicki
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: