Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-1434

count of a nullable column in tpcds gives incorrect results

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.6.0
    • 0.8.0
    • Functions - Drill
    • None

    Description

      code base
      #Fri Sep 12 14:08:02 PDT 2014
      git.commit.id.abbrev=9e16466

      I have a parquet file (tpcds data) which contains null value on a column. The total count of the column:

      0: jdbc:drill:schema=dfs> select count(ss_quantity) from `tpcds/p1/store_sales.parquet`;
      ------------

      EXPR$0

      ------------

      2880404

      ------------

      The count without considering null is:

      0: jdbc:drill:schema=dfs> select count(ss_quantity) from `tpcds/p1/store_sales.parquet` where ss_quantity is not null;
      ------------

      EXPR$0

      ------------

      2750408

      ------------

      But the count for null value is zero:

      0: jdbc:drill:schema=dfs> select count(ss_quantity) from `tpcds/p1/store_sales.parquet` where ss_quantity is null;
      ------------

      EXPR$0

      ------------

      0

      ------------

      Here is the physical plan look like for this query:

      0: jdbc:drill:schema=dfs> explain plan for select count(ss_quantity) from `tpcds/p1/store_sales.parquet` where ss_quantity is null;
      ----------------------+

      text json

      ----------------------+

      00-00 Screen
      00-01 StreamAgg(group=[{}], EXPR$0=[COUNT($0)])
      00-02 Filter(condition=[IS NULL($0)])
      00-03 ProducerConsumer
      00-04 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/user/root/mondrian/tpcds/p1/store_sales.parquet]], selectionRoot=/user/root/mondrian/tpcds/p1/store_sales.parquet, columns=[SchemaPath [`ss_quantity`]]]])
      {
      "head" :
      Unknown macro: { "version" }

      ,
      "graph" : [ {
      "pop" : "parquet-scan",
      "@id" : 4,
      "entries" : [

      { "path" : "maprfs:/user/root/mondrian/tpcds/p1/store_sales.parquet" }

      ],
      "storage" : {
      "type" : "file",
      "enabled" : true,
      "connection" : "maprfs:///",
      "workspaces" :

      Unknown macro: { "default" }

      ,
      "formats" :

      Unknown macro: { "psv" }

      },
      "format" :

      { "type" : "parquet" }

      ,
      "columns" : [ "`ss_quantity`" ],
      "selectionRoot" : "/user/root/mondrian/tpcds/p1/store_sales.parquet",
      "cost" : 2880404.0
      },

      { "pop" : "producer-consumer", "@id" : 3, "child" : 4, "size" : 10, "initialAllocation" : 1000000, "maxAllocation" : 10000000000, "cost" : 2880404.0 }

      ,

      { "pop" : "filter", "@id" : 2, "child" : 3, "expr" : "isnull(`ss_quantity`) ", "initialAllocation" : 1000000, "maxAllocation" : 10000000000, "cost" : 720101.0 }

      ,

      Unknown macro: { "pop" }

      ,

      { "pop" : "screen", "@id" : 0, "child" : 1, "initialAllocation" : 1000000, "maxAllocation" : 10000000000, "cost" : 72010.1 }

      ]
      }

      ----------------------+

      Attachments

        Issue Links

          Activity

            People

              amansinha100 Aman Sinha
              cchang@maprtech.com Chun Chang
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: