[SPARK-5904] DataFrame methods with varargs do not work in Java - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 1.3.0
Fix Version/s: 1.3.0
Component/s: Java API, SQL
Labels:
- DataFrame

Target Version/s:

1.3.0

Description

DataFrame methods with varargs fail when called from Java due to a bug in Scala.

This can be produced by, e.g., modifying the end of the example ml.JavaSimpleParamsExample in the master branch:

    DataFrame results = model2.transform(test);
    results.printSchema(); // works
    results.collect(); // works
    results.filter("label > 0.0").count(); // works
    for (Row r: results.select("features", "label", "myProbability", "prediction").collect()) { // fails on select
      System.out.println("(" + r.get(0) + ", " + r.get(1) + ") -> prob=" + r.get(2)
          + ", prediction=" + r.get(3));
    }

I have also tried groupBy and found that failed too.

The error looks like this:

Exception in thread "main" java.lang.AbstractMethodError: org.apache.spark.sql.DataFrameImpl.groupBy(Ljava/lang/String;[Ljava/lang/String;)Lorg/apache/spark/sql/GroupedData;
	at org.apache.spark.examples.ml.JavaSimpleParamsExample.main(JavaSimpleParamsExample.java:108)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

The error appears to be from this Scala bug with using varargs in an abstract method:
https://issues.scala-lang.org/browse/SI-9013

My current plan is to move the implementations of the methods with varargs from DataFrameImpl to DataFrame.

However, this may cause issues with IncomputableColumn---feedback??

Thanks to joshrosen for figuring the bug and fix out!

Attachments

Issue Links

links to

[Github] Pull Request #4686 (rxin)

[Github] Pull Request #4751 (rxin)

Activity

People

Assignee:: Reynold Xin

Reporter:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 19/Feb/15 00:27

Updated:: 24/Feb/15 23:38

Resolved:: 23/Feb/15 18:11