Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.12.2
-
None
Description
Background
While writing out large Parquet tables using Spark, we've noticed that BinaryComparator is the source of substantial churn of extremely short-lived `HeapByteBuffer` objects – It's taking up to 16% of total amount of allocations in our benchmarks, putting substantial pressure on a Garbage Collector:
profile_48449_alloc_1638494450_sort_by.html
Proposal
We're proposing to adjust lexicographical comparison (at least) to avoid doing any allocations, since this code lies on the hot-path of every Parquet write, therefore causing substantial churn amplification.
Attachments
Attachments
Issue Links
- is depended upon by
-
PARQUET-2145 Release 1.12.3
- Resolved
- relates to
-
HUDI-2948 Hudi Clustering Performance
- Closed