Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5166 Stabilize Spark SQL APIs
  3. SPARK-5904

DataFrame methods with varargs do not work in Java

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 1.3.0
    • 1.3.0
    • Java API, SQL

    Description

      DataFrame methods with varargs fail when called from Java due to a bug in Scala.

      This can be produced by, e.g., modifying the end of the example ml.JavaSimpleParamsExample in the master branch:

          DataFrame results = model2.transform(test);
          results.printSchema(); // works
          results.collect(); // works
          results.filter("label > 0.0").count(); // works
          for (Row r: results.select("features", "label", "myProbability", "prediction").collect()) { // fails on select
            System.out.println("(" + r.get(0) + ", " + r.get(1) + ") -> prob=" + r.get(2)
                + ", prediction=" + r.get(3));
          }
      

      I have also tried groupBy and found that failed too.

      The error looks like this:

      Exception in thread "main" java.lang.AbstractMethodError: org.apache.spark.sql.DataFrameImpl.groupBy(Ljava/lang/String;[Ljava/lang/String;)Lorg/apache/spark/sql/GroupedData;
      	at org.apache.spark.examples.ml.JavaSimpleParamsExample.main(JavaSimpleParamsExample.java:108)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
      	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
      	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
      	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
      	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
      

      The error appears to be from this Scala bug with using varargs in an abstract method:
      https://issues.scala-lang.org/browse/SI-9013

      My current plan is to move the implementations of the methods with varargs from DataFrameImpl to DataFrame.

      However, this may cause issues with IncomputableColumn---feedback??

      Thanks to joshrosen for figuring the bug and fix out!

      Attachments

        Activity

          People

            rxin Reynold Xin
            josephkb Joseph K. Bradley
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: