Details
-
Improvement
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
None
-
None
Description
Currently, we do not support FULL OUTER JOIN in MapJoin.
Rough TPC-DS timings run on laptop:
(NOTE: Query 51 has PTF as a bigger serial portion – Amdahl's law at play)
FULL OUTER MapJoin OFF = MergeJoin
Query 51:
o Vectorization OFF
• FULL OUTER MapJoin OFF: 4:30 minutes
• FULL OUTER MapJoin ON: 4:37 minutes
o Vectorization ON
• FULL OUTER MapJoin OFF: 2:35 minutes
• FULL OUTER MapJoin ON: 1:47 minutes
Query 97:
o Vectorization OFF
• FULL OUTER MapJoin OFF: 2:37 minutes
• FULL OUTER MapJoin ON: 2:42 minutes
o Vectorization ON
• FULL OUTER MapJoin OFF: 1:17 minutes
• FULL OUTER MapJoin ON: 0:06 minutes
FULL OUTER Join 10,000,000 rows against 323,910 small table keys
o Vectorization ON
• FULL OUTER MapJoin OFF: 14:56 minutes
• FULL OUTER MapJoin ON: 1:45 minutes
FULL OUTER Join 10,000,000 rows against 1,000 small table keys
o Vectorization ON
• FULL OUTER MapJoin OFF: 12:37 minutes
• FULL OUTER MapJoin ON: 1:38 minutes
Hopefully, someone will do large scale cluster testing. [DynamicPartitionedHashJoin] MapJoin should scale dramatically better than [Sort] MergeJoin reduce-shuffle.
Attachments
Attachments
Issue Links
- causes
-
HIVE-21288 Runtime rowcount calculation is incorrect in vectorized executions
- Closed
-
HIVE-21923 Vectorized MapJoin may miss results when only the join key is selected
- Closed
- links to