Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 0.7
-
None
-
None
Description
Running the tpchq4 query on the 10 node cluster results in terrible performance (it is fine on the 17 node so something about the key distribution).
HASH_JOIN_NODE (id=2):(24m27s 88.96%)
- BuildBuckets: 1.02K (1024) <--- Few build buckets
- BuildRows: 573.38K (573377) <--- Lots of keys, indicating they have all collided on the same bucket
- BuildTime: 73.118ms
- MemoryUsed: 0.00
- ProbeRows: 37.94M (37935647)
- ProbeTime: 22m16s <--- Ridiculous amount of time on the probe side, indicating we are spending a lot of time looking through a long chained bucket.
- RowsReturned: 1.45M (1449806)
- RowsReturnedRate: 987.00 /sec