Uploaded image for project: 'Apache Knox'
  1. Apache Knox
  2. KNOX-1530

Improve Gzip Compression Handling Performance

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 1.2.0
    • None
    • None

    Description

      While looking at KNOX-1524, I found that requesting compressed results can cause performance impacts. Knox currently does the following:

      This lead to recompressing some streams (KNOX-732, KNOX-855, KNOX-856) based on MimeTypes. Even if we disableContentCompression, KNOX-565 added the following which should only come into play with the above HttpClient transparent decompression disabled (or multipart Gzip files - KNOX-1518):

      • Try to decompress the stream
        • Currently uses try/catch
      • Run any rewrite filter rules
      • If decompressed, recompress the stream

      For many use cases, there is no reason to decompress and recompress the same stream. This is because there are no rewrite rules that apply. One example of this is Hive where beeline requests compression and HiveServer2 added support for returning compressed results with HIVE-17194. Another is with WebHDFS where we don't want to change the content going back to the client.

      I am planning to address this in a few pieces:

      • Determine if any rewrite rules apply before decompressing
        • If rewrite rules apply, then decompress and recompress as before
        • If rewrite rules do not apply, then copy stream as is
      • Remove gzip filter added by KNOX-732
        • Figure out if there is another code path where decompress/recompress should happen
        • We should not have to rely on Jetty to recompress content
      • Disable httpclient content compression
        • Need to make sure we handle decompress/recompress where necessary

      With all 3 improvements in place we should end up with:

      • One place where gzip decompress/recompress happens
      • Only decompress/recompress if rewrite rules match
      • Performance increases due to skipping unnecessary decompress/recompress

      Attachments

        Issue Links

          Activity

            People

              krisden Kevin Risden
              krisden Kevin Risden
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: