Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.3.0
-
None
-
None
Description
We have encountered substantial skew in the hash based operators (hash distribution, hash aggregation, hash join) for certain data sets. Two such issues are DRILL-2803, DRILL-4119.
It would be very useful to have a unit test suite to test the quality of hashing.
The number of combinations is large: num_data_types x nullability x num_hash_function_types (32bit, 64bit, AsDouble variations). Plus, the nature of the data itself. We would have to be judicious about picking a reasonable subset of this space. We should also look at open source test suites in this area.