Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4513

Query returns null when star in project and filter over non null values

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 1.4.0
    • None
    • Storage - JSON
    • None
    • 4 node cluster CentOS

    Description

      When there is a star in project and there are many nulls followed by a non null value in the column c1 in JSON file, query does not return correct results.

      json file with 4096 records that have a field

      {c1: null}

      followed by the 4097th record that has

      {c1: "Hello World"}

      Drill version
      http://yum.qa.lab/drill-opensource.ebf/mapr-drill-1.4.0.201603151147-1.noarch.rpm

      0: jdbc:drill:schema=dfs.tmp> select * from sys.version;
      +---------+-----------+----------------+-------------+-------------+------------+
      | version | commit_id | commit_message | commit_time | build_email | build_time |
      +---------+-----------+----------------+-------------+-------------+------------+
      | 1.4.0 | 99ec15919cd73f5e7b976f6193af45c0e46ffd7a | DRILL-4479: For empty fields under all_text_mode enabled (a) use varchar for the default columns and (b) ensure we create fields corresponding to all columns. | 14.03.2016 @ 20:52:13 UTC | Unknown | 15.03.2016 @ 18:47:50 UTC |
      +---------+-----------+----------------+-------------+-------------+------------+
      1 row selected (0.375 seconds)
      0: jdbc:drill:schema=dfs.tmp> alter system set `store.json.all_text_mode`=true;
      +-------+------------------------------------+
      |  ok   |              summary               |
      +-------+------------------------------------+
      | true  | store.json.all_text_mode updated.  |
      +-------+------------------------------------+
      1 row selected (0.136 seconds)
      0: jdbc:drill:schema=dfs.tmp> select * from `tooManyNulls.json` t where t.c1 = 'Hello World';
      Error: SYSTEM ERROR: NumberFormatException: Hello World
      
      Fragment 0:0
      
      [Error Id: f9022f22-12cd-46c7-b471-470a50506ba2 on centos-04.qa.lab:31010] (state=,code=0)
      0: jdbc:drill:schema=dfs.tmp> select * from `tooManyNulls.json` t where t.c1 IS NOT NULL;
      +-------+
      |   *   |
      +-------+
      | null  |
      +-------+
      1 row selected (0.239 seconds)
      0: jdbc:drill:schema=dfs.tmp> select t.c1 from `tooManyNulls.json` t where t.c1 = 'Hello World';
      +--------------+
      |      c1      |
      +--------------+
      | Hello World  |
      +--------------+
      1 row selected (0.282 seconds)
      0: jdbc:drill:schema=dfs.tmp> select t.c1 from `tooManyNulls.json` t where t.c1 IS NOT NULL;
      +--------------+
      |      c1      |
      +--------------+
      | Hello World  |
      +--------------+
      1 row selected (0.259 seconds)
      

      Another interesting observation to note, the below query returns null. However, when the same query is used as a sub-query, we see the query returns correct results.

      0: jdbc:drill:schema=dfs.tmp> select * from `tooManyNulls.json` t where t.c1 IS NOT NULL;
      +-------+
      |   *   |
      +-------+
      | null  |
      +-------+
      1 row selected (0.203 seconds)
      0: jdbc:drill:schema=dfs.tmp> select tmp.c1 from (select * from `tooManyNulls.json` t where t.c1 IS NOT NULL) tmp;
      +--------------+
      |      c1      |
      +--------------+
      | Hello World  |
      +--------------+
      1 row selected (0.25 seconds)
      

      Attachments

        1. tooManyNulls.json
          48 kB
          Khurram Faraaz

        Activity

          People

            Unassigned Unassigned
            khfaraaz Khurram Faraaz
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: