Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5166 Stabilize Spark SQL APIs
  3. SPARK-5904

DataFrame methods with varargs do not work in Java

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.3.0
    • Fix Version/s: 1.3.0
    • Component/s: Java API, SQL
    • Labels:
    • Target Version/s:

      Description

      DataFrame methods with varargs fail when called from Java due to a bug in Scala.

      This can be produced by, e.g., modifying the end of the example ml.JavaSimpleParamsExample in the master branch:

          DataFrame results = model2.transform(test);
          results.printSchema(); // works
          results.collect(); // works
          results.filter("label > 0.0").count(); // works
          for (Row r: results.select("features", "label", "myProbability", "prediction").collect()) { // fails on select
            System.out.println("(" + r.get(0) + ", " + r.get(1) + ") -> prob=" + r.get(2)
                + ", prediction=" + r.get(3));
          }
      

      I have also tried groupBy and found that failed too.

      The error looks like this:

      Exception in thread "main" java.lang.AbstractMethodError: org.apache.spark.sql.DataFrameImpl.groupBy(Ljava/lang/String;[Ljava/lang/String;)Lorg/apache/spark/sql/GroupedData;
      	at org.apache.spark.examples.ml.JavaSimpleParamsExample.main(JavaSimpleParamsExample.java:108)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
      	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
      	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
      	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
      	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
      

      The error appears to be from this Scala bug with using varargs in an abstract method:
      https://issues.scala-lang.org/browse/SI-9013

      My current plan is to move the implementations of the methods with varargs from DataFrameImpl to DataFrame.

      However, this may cause issues with IncomputableColumn---feedback??

      Thanks to Josh Rosen for figuring the bug and fix out!

        Attachments

          Activity

            People

            • Assignee:
              rxin Reynold Xin
              Reporter:
              josephkb Joseph K. Bradley
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: