Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4119

Skew in hash distribution for varchar (and possibly other) types of data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.3.0
    • 1.4.0
    • Functions - Drill
    • None

    Description

      We are seeing substantial skew for an Id column that contains varchar data of length 32. It is easily reproducible by a group-by query:

      Explain plan for SELECT SomeId From table GROUP BY SomeId;
      ...
      01-02          HashAgg(group=[{0}])
      01-03            Project(SomeId=[$0])
      01-04              HashToRandomExchange(dist0=[[$0]])
      02-01                UnorderedMuxExchange
      03-01                  Project(SomeId=[$0], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
      03-02                    HashAgg(group=[{0}])
      03-03                      Project(SomeId=[$0])
      

      The string id happens to be of the following type:

      e4b4388e8865819126cb0e4dcaa7261d
      

      Attachments

        Activity

          People

            amansinha100 Aman Sinha
            amansinha100 Aman Sinha
            Victoria Markman Victoria Markman
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: