Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-13126

Add Brotli compression codec

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 2.7.2
    • None
    • io

    Description

      I've been testing Brotli, a new compression library based on LZ77 from Google. Google's brotli benchmarks look really good and we're also seeing a significant improvement in compression size, compression speed, or both.

      Brotli preliminary test results
      [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet --compression-codec snappy --overwrite                      
      
      real    1m17.106s
      user    1m30.804s
      sys     0m4.404s
      
      [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet --compression-codec brotli --overwrite                         
      
      real    1m16.640s
      user    1m24.244s
      sys     0m6.412s
      
      [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet --compression-codec gzip --overwrite                            
      
      real    3m39.496s
      user    3m48.736s
      sys     0m3.880s
      
      [blue@work Downloads]$ ls -l
      -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet
      -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet
      -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet
      

      Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. Another test resulted in a slightly larger Brotli file than gzip produced, but Brotli was 4x faster. I'd like to get this compression codec into Hadoop.

      Brotli is licensed with the MIT license, and the JNI library jbrotli is ALv2.

      Attachments

        1. HADOOP-13126.5.patch
          26 kB
          Ryan Blue
        2. HADOOP-13126.4.patch
          21 kB
          Ryan Blue
        3. HADOOP-13126.3.patch
          21 kB
          Ryan Blue
        4. HADOOP-13126.2.patch
          19 kB
          Ryan Blue
        5. HADOOP-13126.1.patch
          19 kB
          Ryan Blue

        Issue Links

          Activity

            People

              rdblue Ryan Blue
              rdblue Ryan Blue
              Votes:
              2 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m