Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-1897

Using 'avg' along with 'length' results in wrong output when nulls are present

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.9.0
    • Functions - Drill
    • None

    Description

      git.commit.id.abbrev=9dfa4a1

      Dataset :

      {
       "col1":1,
       "col2":"abc"
      }
      {
       "col1":1,
       "col2":null
      }
      {
       "col1":null,
       "col2":null
      }
      

      The below query should return 3 instead of 1 (postgres correctly returns 3).
      Query :

      select avg(length(col2)) from `a.json`;
      +------------+
      |   EXPR$0   |
      +------------+
      | 1.0        |
      +------------+
      

      Plan for the above query :

      00-00    Screen
      00-01      Project(EXPR$0=[CAST(/(CastHigh(CASE(=($1, 0), null, $0)), $1)):ANY])
      00-02        StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT()])
      00-03          Project($f0=[length($0)])
      00-04            Scan(groupscan=[EasyGroupScan [selectionRoot=/drill/testdata/data-shapes/wide-columns/flat/json/a.json, numFiles=1, columns=[`col2`], files=[maprfs:/drill/testdata/data-shapes/wide-columns/flat/json/a.json]]])
      

      The below query which does not involve a second function works as expected even though nulls are present

       select avg(col1) from `a.json`;
      +------------+
      |   EXPR$0   |
      +------------+
      | 1.0        |
      +------------+
      

      Let me know if you need any more information.

      Attachments

        1. DRILL-1897.patch
          5 kB
          Mehant Baid

        Activity

          People

            mehant Mehant Baid
            rkins Rahul Kumar Challapalli
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: