Description
This query has an interesting case where the big table work is empty. Here's the MR plan:
STAGE DEPENDENCIES: Stage-4 is a root stage Stage-3 depends on stages: Stage-4 Stage-0 depends on stages: Stage-3 STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias -> Map Local Tables: b Fetch Operator limit: -1 Alias -> Map Local Operator Tree: b TableScan alias: b Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: UDFToDouble(key) is not null (type: boolean) Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator condition expressions: 0 {key} 1 {value} keys: 0 UDFToDouble(key) (type: double) 1 UDFToDouble(key) (type: double) Stage: Stage-3 Map Reduce Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink
The plan for Spark is not correct. We need to investigate the issue.