Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 3.1.0
-
None
-
ghx-label-4
Description
Consider this extract from a query plan:
Operator #Rows Est. #Rows -------------------------------------------------------------- … | 10:HASH JOIN 9.53M 18.14K | |--19:EXCHANGE 1 1 | | 00:SCAN HDFS 1 1 | 06:NESTED LOOP JOIN 4.88B 863.84K | |--18:EXCHANGE 1 1 | | 04:SCAN HDFS 1 1 | 05:HASH JOIN 9.53M 863.84K
If the above is to be believed, the 06 nested loop join produced 5 billion rows. But, the actual number is far too huge for that: joining 1 row with 10 million rows cannot produce 500 times that number of rows.
It appears that the nested loop join actually processed and returned the 9.5 million rows, since that is the same number produced by the 10 hash join which joins a single row with the output of the nested loop join.
Because this same bogus result appears across multiple plans, it is likely that the actual number is completely wrong and bears no relation to the number of rows actually returned.
Attachments
Issue Links
- is duplicated by
-
IMPALA-8956 Row count incorrect in summary while query running
- Resolved