[HDFS-10662] Optimize UTF8 string/byte conversions - ASF JIRA

Voters

Watch issue

Watchers

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.8.0, 2.7.4, 3.0.0-alpha1
Component/s: hdfs
Labels:
None

Target Version/s:

2.9.0
Hadoop Flags:

Reviewed

Description

String/byte conversions may take either a Charset instance or its canonical name. One might think a Charset instance would be faster due to avoiding a lookup and instantiation of a Charset, but it's not. The canonical string name variants will cache the string encoder/decoder (obtained from a Charset) resulting in better performance.

~~LOG4J2-935~~ describes a real-world performance boost. I micro-benched a marginal runtime improvement on jdk 7/8. However for a 16 byte path, using the canonical name generated 50% less garbage. For a 64 byte path, 25% of the garbage. Given the sheer number of times that paths are (re)parsed, the cost adds up quickly.