Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40660

Switch to XORShiftRandom to distribute elements

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.3.1, 3.2.3, 3.4.0
    • SQL
    • None

    Description

      import java.util.Random
      import org.apache.spark.util.random.XORShiftRandom
      import scala.util.hashing
      
      def distribution(count: Int, partition: Int) = {
        println((1 to count).map(partitionId => new Random(partitionId).nextInt(partition))
          .groupBy(f => f)
          .map(_._2.size).mkString(". "))
      
        println((1 to count).map(partitionId => new Random(hashing.byteswap32(partitionId)).nextInt(partition))
          .groupBy(f => f)
          .map(_._2.size).mkString(". "))
      
        println((1 to count).map(partitionId => new XORShiftRandom(partitionId).nextInt(partition))
          .groupBy(f => f)
          .map(_._2.size).mkString(". "))
      }
      
      distribution(200, 4)
      
      200
      50. 60. 46. 44
      55. 48. 43. 54
      

      Attachments

        Activity

          People

            yumwang Yuming Wang
            yumwang Yuming Wang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: