Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Cannot Reproduce
-
None
-
None
-
None
-
None
Description
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.takePartition.
: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/palantir/spark/spark-0.8.0/python/pyspark/worker.py", line 82, in main
for obj in func(split_index, iterator):
File "/opt/palantir/spark/spark-0.8.0/python/pyspark/serializers.py", line 41, in batched
for item in iterator:
File "/opt/palantir/spark/spark-0.8.0/python/pyspark/rdd.py", line 521, in takeUpToNum
yield next(iterator)
File "/usr/lib64/python2.6/csv.py", line 104, in next
row = self.reader.next()
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 440: ordinal not in range(128)
at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:151)
at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:173)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:116)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
at org.apache.spark.scheduler.DAGScheduler.runLocallyWithinThread(DAGScheduler.scala:484)
at org.apache.spark.scheduler.DAGScheduler$$anon$2.run(DAGScheduler.scala:470)