Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38703

High GC and memory footprint after switch to ZSTD

    XMLWordPrintableJSON

Details

    • Question
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 3.1.2
    • None
    • Input/Output
    • None

    Description

      Hi All,

      We started to switch our Spark pipelines to read parquet with ZSTD compression. 
      After the switch we see that memory footprint is much larger than previously with SNAPPY.

      Additionally GC stats of the jobs are much higher comparing to SNAPPY with the same workload as previously. 

      Is there any configurations that may be relevant to read path, that may help in such cases ?

      Attachments

        Activity

          People

            Unassigned Unassigned
            mixermt Michael Taranov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: