[SPARK-38703] High GC and memory footprint after switch to ZSTD - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Question
Status: Resolved
Priority: Major
Resolution: Invalid
Affects Version/s: 3.1.2
Fix Version/s: None
Component/s: Input/Output
Labels:
None

Description

Hi All,

We started to switch our Spark pipelines to read parquet with ZSTD compression.
After the switch we see that memory footprint is much larger than previously with SNAPPY.

Additionally GC stats of the jobs are much higher comparing to SNAPPY with the same workload as previously.

Is there any configurations that may be relevant to read path, that may help in such cases ?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Michael Taranov

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 30/Mar/22 16:31

Updated:: 24/Nov/22 00:31

Resolved:: 16/Apr/22 20:19