Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
1.14.0
Description
Context
PartitionPathUtils.escapePathName is a pretty simple method that takes String, allocates StringBuilder, appends original or escaped chars, and outputs the result String.
Filesystem sink calls the method several times for each element to determine bucket id. Because of this, it is a hot spot on a workload that writes intensively to filesystem, such as nexmark q10. On my local machine escaping of chars takes 9.53% CPU and 17.8% mem allocations of the whole TaskManager process.
Proposal
PartitionPathUtils.escapePathName improvements
- Use more efficient Integer.toHexString instead of String.format
- Do not allocate new string when there is no escapable char in the original string
- Allocate StringBuilder depending on the original string length instead of the default value
Benefit
Experiment on local machine.
1 TaskManager with 6 slots. Job parallelism 6. Nexmark default configuration + object reuse option.
Before: flink-1.14.0
After: flink-1.14.0 + patch with the improvements
Nexmark q10 | Before | After |
---|---|---|
CPU samples of escapePathName() (% of all) | 9.53 | 1.64 |
Memory allocations by escapePathName() (% of all) | 17.8 | 2.98 |
Throughput/Cores (K/s) | 107.64 | 119.42 |
Diff: CPU -7.89%, Memory -14.82%, Throughput +10.9%
Profiling reports are in the attachment.
Attachments
Attachments
Issue Links
- links to