Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15611

Got the same sequence random number in every forked worker.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.6.1
    • None
    • PySpark
    • None

    Description

      hi, i'm writing some code as below:

      marlkov.py
      from random import random
      from operator import add
      def funcx( x ):
        print x[0],x[1]
        return 1 if x[0]** 2 + x[1]** 2 < 1 else 0
      def genRnd(ind):
        x=random() * 2 - 1
        y=random() * 2 - 1
        return (x,y)
      def runsp(total):
        ret=sc.parallelize(xrange(total),1).map(genRnd).map(funcx).reduce(lambda x, y: x + y)/float(total) * 4
        print ret
      runsp(3)
      

      once started the pyspark shell, no matter how many times i run "runsp(N)" , this code always get a same sequece of random numbers, like this

      Output
      0.896083541418 -0.635625854075
      -0.0423532645466 -0.526910255885
      0.498518696049 -0.872983895832
      1.3333333333333333
      >>> sc.parallelize(xrange(total),1).map(genRnd).map(funcx).reduce(add)/float(total) * 4
      0.896083541418 -0.635625854075
      -0.0423532645466 -0.526910255885
      0.498518696049 -0.872983895832
      1.3333333333333333
      >>> sc.parallelize(xrange(total),1).map(genRnd).map(funcx).reduce(add)/float(total) * 4
      0.896083541418 -0.635625854075
      -0.0423532645466 -0.526910255885
      0.498518696049 -0.872983895832
      1.3333333333333333
      

      i think this is because when we import pyspark.worker in the daemon.py, we alse import a random by shuffle.py which is imported by pyspark.worker,
      this worker, forked by pid = os.fork(), also remains the state of the parent's random, thus every forked worker get the same random.next().

      we need to re-random the random by random.seed, which will solve the problem, but i think this PR. may not be the proper fix.
      ths.

      Attachments

        Activity

          People

            Unassigned Unassigned
            thomaslau Thomas Lau
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: