Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.0
-
Spark 2.0.0
Description
My Environment
Spark 2.0.0
I have included the physical plan of my application below.
Issue description
The result from a query that uses the LAST function are incorrect.
The output obtained for the column that corresponds to the last function is null .
My input data contain 3 rows .
The application resulted in 2 stages
The first stage consisted of 3 tasks .
The first task/partition contains 2 rows
The second task/partition contains 1 row
The last task/partition contain 0 rows
The result from the query executed for the LAST column call is NULL which I believe is due to the PARTIAL_LAST on the last partition .
I believe that this behavior is incorrect. The PARTIAL_LAST call on an empty partition should not return null .
== Physical Plan == InsertIntoHiveTable MetastoreRelation default, bdm_3449_tgt20, true, false +- *Project [last(C3_1)#51 AS field#102, cast(round(max(C3_0)#50, 0) as int) AS field1#103, cast(round(max(C3_0)#50, 0) as int) AS field2#104] +- SortAggregate(key=[], functions=[max(C3_0#40),last(C3_1#41, false)], output=[max(C3_0)#50,last(C3_1)#51]) +- SortAggregate(key=[], functions=[partial_max(C3_0#40),partial_last(C3_1#41, false)], output=[max#91,last#92]) +- *Project [CAST(sum(C1_0) AS DOUBLE)#27 AS C3_0#40, last(C1_1)#28 AS C3_1#41] +- SortAggregate(key=[], functions=[sum(cast(C1_0#17 as bigint)),last(C1_1#18, false)], output=[CAST(sum(C1_0) AS DOUBLE)#27,last(C1_1)#28]) +- Exchange SinglePartition +- SortAggregate(key=[], functions=[partial_sum(cast(C1_0#17 as bigint)),partial_last(C1_1#18, false)], output=[sum#95L,last#96]) +- *Project [field1#7 AS C1_0#17, field#6 AS C1_1#18] +- HiveTableScan [field1#7, field#6], MetastoreRelation default, bdm_3449_src, alias