Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-118

Provide option to use on-heap buffers for Snappy compression/decompression

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.6.0
    • Fix Version/s: None
    • Component/s: parquet-mr
    • Labels:
      None

      Description

      The current code uses direct off-heap buffers for decompression. If many decompressors are instantiated across multiple threads, and/or the objects being decompressed are large, this can lead to a huge amount of off-heap allocation by the JVM. This can be exacerbated if overall, there is not heap contention, since no GC will be performed to reclaim the space used by these buffers.

      It would be nice if there was a flag we cold use to simply allocate on-heap buffers here:

      https://github.com/apache/incubator-parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/codec/SnappyDecompressor.java#L28

      We ran into an issue today where these buffers totaled a very large amount of storage and caused our Java processes (running within containers) to be terminated by the kernel OOM-killer.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                pwendell Patrick Wendell
              • Votes:
                8 Vote for this issue
                Watchers:
                16 Start watching this issue

                Dates

                • Created:
                  Updated: