Details
-
Task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
There have been few instances where the hash function used by Drill has produced skewed results on different data sets. It would be good to create some tests which can detect skew in hash codes produced by Hash Functions. This will help to avoid any regression based on changes for hash function usage or implementations. Creating data on fly in the tests like:
1) Set of random numbers.
2) Set of randomly generated strings
3) Set of random string with same prefix
4) Set of random string with same suffix
5) Set of continuous numbers.
And also adding Issue Data sets found during investigations in DRILL-4237 / DRILL-5816 / DRILL-4119