Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15826

PipedRDD to allow configurable char encoding

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Trivial
    • Resolution: Fixed
    • None
    • 2.0.0
    • Spark Core
    • None

    Description

      Encountered an issue wherein the code works in some cluster but fails on another one for the same input. After debugging realised that PipedRDD is picking default char encoding from the JVM which may be different across different platforms. Making it use UTF-8 encoding just like `ScriptTransformation` does.

      Stack trace:

      Caused by: java.nio.charset.MalformedInputException: Input length = 1
      	at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
      	at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
      	at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
      	at java.io.InputStreamReader.read(InputStreamReader.java:184)
      	at java.io.BufferedReader.fill(BufferedReader.java:161)
      	at java.io.BufferedReader.readLine(BufferedReader.java:324)
      	at java.io.BufferedReader.readLine(BufferedReader.java:389)
      	at scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:67)
      	at org.apache.spark.rdd.PipedRDD$$anon$1.hasNext(PipedRDD.scala:185)
      	at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1612)
      	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1160)
      	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1160)
      	at org.apache.spark.SparkContext$$anonfun$runJob$6.apply(SparkContext.scala:1868)
      	at org.apache.spark.SparkContext$$anonfun$runJob$6.apply(SparkContext.scala:1868)
      	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
      	at org.apache.spark.scheduler.Task.run(Task.scala:89)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      

      Attachments

        Activity

          People

            tejasp Tejas Patil
            tejasp Tejas Patil
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: