Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21967

org.apache.spark.unsafe.types.UTF8String#compareTo Should Compare 8 Bytes at a Time for Better Performance

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.2.0
    • 2.3.0
    • Spark Core

    Description

      org.apache.spark.unsafe.types.UTF8String#compareTo contains the following TODO:

          int len = Math.min(numBytes, other.numBytes);
          // TODO: compare 8 bytes as unsigned long
          for (int i = 0; i < len; i ++) {
            // In UTF-8, the byte should be unsigned, so we should compare them as unsigned int.
      

      The todo should be resolved by comparing the maximum number of 64bit words possible in this method, before falling back to unsigned int comparison.

      Attachments

        Activity

          People

            original-brownbear Armin Braun
            original-brownbear Armin Braun
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: