[SPARK-15826] PipedRDD to allow configurable char encoding - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Trivial
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.0.0
Component/s: Spark Core
Labels:
None

Description

Encountered an issue wherein the code works in some cluster but fails on another one for the same input. After debugging realised that PipedRDD is picking default char encoding from the JVM which may be different across different platforms. Making it use UTF-8 encoding just like `ScriptTransformation` does.

Stack trace:

Caused by: java.nio.charset.MalformedInputException: Input length = 1
	at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
	at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
	at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
	at java.io.InputStreamReader.read(InputStreamReader.java:184)
	at java.io.BufferedReader.fill(BufferedReader.java:161)
	at java.io.BufferedReader.readLine(BufferedReader.java:324)
	at java.io.BufferedReader.readLine(BufferedReader.java:389)
	at scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:67)
	at org.apache.spark.rdd.PipedRDD$$anon$1.hasNext(PipedRDD.scala:185)
	at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1612)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1160)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1160)
	at org.apache.spark.SparkContext$$anonfun$runJob$6.apply(SparkContext.scala:1868)
	at org.apache.spark.SparkContext$$anonfun$runJob$6.apply(SparkContext.scala:1868)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Attachments

Issue Links

links to

[Github] Pull Request #13563 (tejasapatil)

Activity

People

Assignee:: Tejas Patil

Reporter:: Tejas Patil

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 08/Jun/16 18:43

Updated:: 15/Jun/16 19:03

Resolved:: 15/Jun/16 19:03