Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4122

Create unit test suite for checking quality of hashing for hash based operators

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.3.0
    • None
    • Functions - Drill
    • None

    Description

      We have encountered substantial skew in the hash based operators (hash distribution, hash aggregation, hash join) for certain data sets. Two such issues are DRILL-2803, DRILL-4119.

      It would be very useful to have a unit test suite to test the quality of hashing.
      The number of combinations is large: num_data_types x nullability x num_hash_function_types (32bit, 64bit, AsDouble variations). Plus, the nature of the data itself. We would have to be judicious about picking a reasonable subset of this space. We should also look at open source test suites in this area.

      Attachments

        Activity

          People

            cshi Chunhui Shi
            amansinha100 Aman Sinha
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: