Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7708

Switch to faster compression strategy for incremental stats

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 3.1.0
    • Impala 3.1.0
    • Catalog
    • None
    • ghx-label-3

    Description

      Currently we set the Deflater mode to BEST_COMPRESSION by default.

      public static byte[] deflateCompress(byte[] input) {
          if (input == null) return null;
          ByteArrayOutputStream bos = new ByteArrayOutputStream(input.length);
          // TODO: Benchmark other compression levels.
          DeflaterOutputStream stream =
              new DeflaterOutputStream(bos, new Deflater(Deflater.BEST_COMPRESSION));
      

      In some experiments, we noticed that the fastest compression mode (BEST_SPEED) performs ~8x faster with only ~4% compression ratio penalty. 

      Here are some results on a real world table with 3000 partitions with incremental stats.

       

        Time taken for serialization (seconds) OutputBytes size (MB)
      Gzip best compression 92 194
      Gzip fastest compression 11 212
      Gzip default compression 57 195
      No compression 5 452

       

       

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            bharathv Bharath Vissapragada
            bharathv Bharath Vissapragada
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment