Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-1887

Wrong result for hash join on null valued columns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.8.0
    • None
    • None

    Description

      git.commit.id.abbrev=9dfa4a1

      Dataset1:

      {
       "col1":1,
       "col2":"abc"
      }
      {
       "col1":2,
       "col2":null
      }
      

      Dataset 2:

      {
       "col1":1,
       "col2":null
      }
      {
       "col1":2,
       "col2":null
      }
      {
       "col1":2,
       "col2":"abc"
      }
      

      Query :

      select * from `a.json` a join `b.json` b on a.col2=b.col2; 
      +------------+------------+------------+------------+
      |    col1    |    col2    |   col10    |   col20    |
      +------------+------------+------------+------------+
      | 1          | abc        | 2          | abc        |
      | 2          | null       | 1          | null       |
      | 2          | null       | 2          | null       |
      +------------+------------+------------+------------+
      

      Most database engines treat 2 NULL values as not being equal. This makes it hard to compare DRILL's results against any Standard SQL engine. However I am not sure whether drill is intentionally designed to behave this way.

      Attachments

        Activity

          People

            amansinha100 Aman Sinha
            rkins Rahul Kumar Challapalli
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: