Details
Description
The following code fails with a stack trace beginning with:
15/02/26 23:21:32 ERROR Executor: Exception in task 2.0 in stage 7.0 (TID 422) org.apache.spark.sql.catalyst.errors.package$TreeNodeException: makeCopy, tree: scalaUDF(sales#2,discounts#3) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) at org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy(TreeNode.scala:309) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:237) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:192) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:207)
Here is the 1.3.0 version of the code:
case class SalesDisc(sales: Double, discounts: Double) def makeStruct(sales: Double, disc:Double) = SalesDisc(sales, disc) sqlContext.udf.register("makeStruct", makeStruct _) val withStruct = sqlContext.sql("SELECT id, sd.sales FROM (SELECT id, makeStruct(sales, discounts) AS sd FROM customerTable) AS d") withStruct.foreach(println)
This used to work in 1.2.0. Interestingly, the following simplified version fails similarly, even though it seems to me to be VERY similar to the last test in the UDFSuite:
SELECT makeStruct(sales, discounts) AS sd FROM customerTable
The data table is defined thus:
val custs = Seq( Cust(1, "Widget Co", 120000.00, 0.00, "AZ"), Cust(2, "Acme Widgets", 410500.00, 500.00, "CA"), Cust(3, "Widgetry", 410500.00, 200.00, "CA"), Cust(4, "Widgets R Us", 410500.00, 0.0, "CA"), Cust(5, "Ye Olde Widgete", 500.00, 0.0, "MA") ) val customerTable = sc.parallelize(custs, 4).toDF() customerTable.registerTempTable("customerTable")