Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Cannot Reproduce
-
2.4.0
-
None
-
None
Description
Ingestion of a CSV file seems to fail with Spark v2.4.0 and Scala v2.12, where it works ok with Scala v2.11.
When running a simple CSV ingestion like:
// Creates a session on a local master SparkSession spark = SparkSession.builder() .appName("CSV to Dataset") .master("local") .getOrCreate(); // Reads a CSV file with header, called books.csv, stores it in a dataframe Dataset<Row> df = spark.read().format("csv") .option("header", "true") .load("data/books.csv");
With Scala 2.12, I get:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 10582
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.accept(BytecodeReadingParanamer.java:563)
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.access$200(BytecodeReadingParanamer.java:338)
at com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:103)
at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:90)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.getCtorParams(BeanIntrospector.scala:44)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1(BeanIntrospector.scala:58)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1$adapted(BeanIntrospector.scala:58)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240)
...
at net.jgp.books.sparkWithJava.ch01.CsvToDataframeApp.start(CsvToDataframeApp.java:37)
at net.jgp.books.sparkWithJava.ch01.CsvToDataframeApp.main(CsvToDataframeApp.java:21)
Where it works pretty smoothly if I switch back to 2.11.
Full example available at https://github.com/jgperrin/net.jgp.books.sparkWithJava.ch01. You can modify pom.xml to change easily the Scala version in the property section:
<properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <java.version>1.8</java.version> <scala.version>2.11</scala.version> <spark.version>2.4.0</spark.version> </properties>
(ps. It's my first bug submission, so I hope I did not mess too much, be tolerant if I did)