Details
Description
DataFrame methods with varargs fail when called from Java due to a bug in Scala.
This can be produced by, e.g., modifying the end of the example ml.JavaSimpleParamsExample in the master branch:
DataFrame results = model2.transform(test); results.printSchema(); // works results.collect(); // works results.filter("label > 0.0").count(); // works for (Row r: results.select("features", "label", "myProbability", "prediction").collect()) { // fails on select System.out.println("(" + r.get(0) + ", " + r.get(1) + ") -> prob=" + r.get(2) + ", prediction=" + r.get(3)); }
I have also tried groupBy and found that failed too.
The error looks like this:
Exception in thread "main" java.lang.AbstractMethodError: org.apache.spark.sql.DataFrameImpl.groupBy(Ljava/lang/String;[Ljava/lang/String;)Lorg/apache/spark/sql/GroupedData; at org.apache.spark.examples.ml.JavaSimpleParamsExample.main(JavaSimpleParamsExample.java:108) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
The error appears to be from this Scala bug with using varargs in an abstract method:
https://issues.scala-lang.org/browse/SI-9013
My current plan is to move the implementations of the methods with varargs from DataFrameImpl to DataFrame.
However, this may cause issues with IncomputableColumn---feedback??
Thanks to joshrosen for figuring the bug and fix out!