Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-930

Unicode failing in pyspark - UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 440: ordinal not in range(128)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Cannot Reproduce
    • None
    • None
    • None
    • None

    Description

      Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.takePartition.
      : org.apache.spark.api.python.PythonException: Traceback (most recent call last):
      File "/opt/palantir/spark/spark-0.8.0/python/pyspark/worker.py", line 82, in main
      for obj in func(split_index, iterator):
      File "/opt/palantir/spark/spark-0.8.0/python/pyspark/serializers.py", line 41, in batched
      for item in iterator:
      File "/opt/palantir/spark/spark-0.8.0/python/pyspark/rdd.py", line 521, in takeUpToNum
      yield next(iterator)
      File "/usr/lib64/python2.6/csv.py", line 104, in next
      row = self.reader.next()
      UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 440: ordinal not in range(128)

      at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:151)
      at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:173)
      at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:116)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
      at org.apache.spark.scheduler.DAGScheduler.runLocallyWithinThread(DAGScheduler.scala:484)
      at org.apache.spark.scheduler.DAGScheduler$$anon$2.run(DAGScheduler.scala:470)

      Attachments

        Activity

          People

            Unassigned Unassigned
            bavardage Ben Duffield
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: