Description
when using python 3.3 , PYTHONHASHSEED is only set in driver, but not propagated to executor, and cause the following error.
File "/Users/jzhang/github/spark/python/pyspark/rdd.py", line 74, in portable_hash raise Exception("Randomness of hash of string should be disabled via PYTHONHASHSEED") Exception: Randomness of hash of string should be disabled via PYTHONHASHSEED at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166) at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:342) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:77) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:45) at org.apache.spark.scheduler.Task.run(Task.scala:81) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
Attachments
Issue Links
- is duplicated by
-
SPARK-12100 bug in spark/python/pyspark/rdd.py portable_hash()
- Resolved
- links to