Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 2.7.4, 3.0.0-alpha1
    • Component/s: hdfs
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      String/byte conversions may take either a Charset instance or its canonical name. One might think a Charset instance would be faster due to avoiding a lookup and instantiation of a Charset, but it's not. The canonical string name variants will cache the string encoder/decoder (obtained from a Charset) resulting in better performance.

      LOG4J2-935 describes a real-world performance boost. I micro-benched a marginal runtime improvement on jdk 7/8. However for a 16 byte path, using the canonical name generated 50% less garbage. For a 64 byte path, 25% of the garbage. Given the sheer number of times that paths are (re)parsed, the cost adds up quickly.

        Attachments

        1. HDFS-10662.patch
          10 kB
          Daryn Sharp
        2. HDFS-10662.patch.1
          9 kB
          Daryn Sharp
        3. HDFS-10662-branch-2.7.00.patch
          8 kB
          Zhe Zhang

          Issue Links

            Activity

              People

              • Assignee:
                daryn Daryn Sharp
                Reporter:
                daryn Daryn Sharp
              • Votes:
                0 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: