Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25317

MemoryBlock performance regression

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 2.4.0
    • 2.4.0
    • SQL
    • None

    Description

      eThere is a performance regression when calculating hash code for UTF8String:

        test("hashing") {
          import org.apache.spark.unsafe.hash.Murmur3_x86_32
          import org.apache.spark.unsafe.types.UTF8String
          val hasher = new Murmur3_x86_32(0)
          val str = UTF8String.fromString("b" * 10001)
          val numIter = 100000
          val start = System.nanoTime
          for (i <- 0 until numIter) {
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
            Murmur3_x86_32.hashUTF8String(str, 0)
          }
          val duration = (System.nanoTime() - start) / 1000 / numIter
          println(s"duration $duration us")
        }
      

      To run this test in 2.3, we need to add

      public static int hashUTF8String(UTF8String str, int seed) {
          return hashUnsafeBytes(str.getBaseObject(), str.getBaseOffset(), str.numBytes(), seed);
        }
      

      to `Murmur3_x86_32`

      In my laptop, the result for master vs 2.3 is: 120 us vs 40 us

      Attachments

        Activity

          People

            mgaido Marco Gaido
            cloud_fan Wenchen Fan
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: