Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21967

org.apache.spark.unsafe.types.UTF8String#compareTo Should Compare 8 Bytes at a Time for Better Performance

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.3.0
    • Component/s: Spark Core
    • Labels:

      Description

      org.apache.spark.unsafe.types.UTF8String#compareTo contains the following TODO:

          int len = Math.min(numBytes, other.numBytes);
          // TODO: compare 8 bytes as unsigned long
          for (int i = 0; i < len; i ++) {
            // In UTF-8, the byte should be unsigned, so we should compare them as unsigned int.
      

      The todo should be resolved by comparing the maximum number of 64bit words possible in this method, before falling back to unsigned int comparison.

        Attachments

          Activity

            People

            • Assignee:
              original-brownbear Armin Braun
              Reporter:
              original-brownbear Armin Braun
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: