Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26146

CSV wouln't be ingested in Spark 2.4.0 with Scala 2.12

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Cannot Reproduce
    • 2.4.0
    • None
    • Input/Output
    • None

    Description

      Ingestion of a CSV file seems to fail with Spark v2.4.0 and Scala v2.12, where it works ok with Scala v2.11.

      When running a simple CSV ingestion like: 

          // Creates a session on a local master
          SparkSession spark = SparkSession.builder()
              .appName("CSV to Dataset")
              .master("local")
              .getOrCreate();
          // Reads a CSV file with header, called books.csv, stores it in a dataframe
          Dataset<Row> df = spark.read().format("csv")
              .option("header", "true")
              .load("data/books.csv");
      

        With Scala 2.12, I get: 

      Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 10582
      at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.accept(BytecodeReadingParanamer.java:563)
      at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.access$200(BytecodeReadingParanamer.java:338)
      at com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:103)
      at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:90)
      at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.getCtorParams(BeanIntrospector.scala:44)
      at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1(BeanIntrospector.scala:58)
      at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1$adapted(BeanIntrospector.scala:58)
      at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240)
      ...
      at net.jgp.books.sparkWithJava.ch01.CsvToDataframeApp.start(CsvToDataframeApp.java:37)
      at net.jgp.books.sparkWithJava.ch01.CsvToDataframeApp.main(CsvToDataframeApp.java:21)
      

      Where it works pretty smoothly if I switch back to 2.11.

      Full example available at https://github.com/jgperrin/net.jgp.books.sparkWithJava.ch01. You can modify pom.xml to change easily the Scala version in the property section:

      <properties>
       <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
       <java.version>1.8</java.version>
       <scala.version>2.11</scala.version>
       <spark.version>2.4.0</spark.version>
      </properties>

       

      (ps. It's my first bug submission, so I hope I did not mess too much, be tolerant if I did)

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            jgp Jean Georges Perrin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: