Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4704

select statement behavior is inconsistent for decimal values in parquet

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.6.0
    • 1.8.0
    • Functions - Drill
    • None
    • Windows 7 Pro, Java 1.8.0_91

    Description

      A select statement that searches a parquet file for a decimal value matching a specific value behaves inconsistently. The query expressed most simply finds nothing:

      0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where employee_id = 100;
      ---------------------------------------------------------------+

      EMPLOYEE_ID FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIRE_DATE

      ---------------------------------------------------------------+

      ---------------------------------------------------------------+

      No rows selected (0.348 seconds)

      The query can be modified to find the matching row in a few ways, such as the following (using between instead of '=', changing 100 to 100.0, or casting as decimal:

      0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where employee_id between 100 and 100;
      ---------------------------------------------------------------+

      EMPLOYEE_ID FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIR

      ---------------------------------------------------------------+

      100 Steven King SKING 515.123.4567 2003-06-1

      ---------------------------------------------------------------+

      1 row selected (0.226 seconds)
      0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where employee_id = 100.0;
      ---------------------------------------------------------------+

      EMPLOYEE_ID FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIR

      ---------------------------------------------------------------+

      100 Steven King SKING 515.123.4567 2003-06-1

      ---------------------------------------------------------------+

      1 row selected (0.259 seconds)
      0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where cast(employee_id AS DECIMAL) = 100;
      ---------------------------------------------------------------+

      EMPLOYEE_ID FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIR

      ---------------------------------------------------------------+

      100 Steven King SKING 515.123.4567 2003-06-1

      ---------------------------------------------------------------+

      1 row selected (0.232 seconds)
      0: jdbc:drill:zk=local>

      The schema of the parquet data that is being searched is as follows:

      $ java -jar parquet-tools*1.jar meta c:/archiveHR/HR.EMPLOYEES/1.parquet
      file: file:/c:/archiveHR/HR.EMPLOYEES/1.parquet
      creator: parquet-mr version 1.8.1 (build 4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf)
      .....
      file schema: HR.EMPLOYEES
      --------------------------------------------------------------------------------
      EMPLOYEE_ID: REQUIRED FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:0
      FIRST_NAME: OPTIONAL BINARY O:UTF8 R:0 D:1
      LAST_NAME: REQUIRED BINARY O:UTF8 R:0 D:0
      EMAIL: REQUIRED BINARY O:UTF8 R:0 D:0
      PHONE_NUMBER: OPTIONAL BINARY O:UTF8 R:0 D:1
      HIRE_DATE: REQUIRED BINARY O:UTF8 R:0 D:0
      JOB_ID: REQUIRED BINARY O:UTF8 R:0 D:0
      SALARY: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
      COMMISSION_PCT: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
      MANAGER_ID: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
      DEPARTMENT_ID: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1

      row group 1: RC:107 TS:9943 OFFSET:4
      --------------------------------------------------------------------------------
      EMPLOYEE_ID: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:4 SZ:360/355/0.99 VC:107 ENC:PLAIN,BIT_PACKED
      FIRST_NAME: BINARY SNAPPY DO:0 FPO:364 SZ:902/1058/1.17 VC:107 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
      LAST_NAME: BINARY SNAPPY DO:0 FPO:1266 SZ:913/1111/1.22 VC:107 ENC:PLAIN,BIT_PACKED
      EMAIL: BINARY SNAPPY DO:0 FPO:2179 SZ:977/1184/1.21 VC:107 ENC:PLAIN,BIT_PACKED
      PHONE_NUMBER: BINARY SNAPPY DO:0 FPO:3156 SZ:750/1987/2.65 VC:107 ENC:PLAIN,RLE,BIT_PACKED
      HIRE_DATE: BINARY SNAPPY DO:0 FPO:3906 SZ:874/2636/3.02 VC:107 ENC:PLAIN_DICTIONARY,BIT_PACKED
      JOB_ID: BINARY SNAPPY DO:0 FPO:4780 SZ:254/302/1.19 VC:107 ENC:PLAIN_DICTIONARY,BIT_PACKED
      SALARY: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5034 SZ:419/580/1.38 VC:107 ENC:PLAIN,RLE,BIT_PACKED
      COMMISSION_PCT: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5453 SZ:97/113/1.16 VC:107 ENC:PLAIN,RLE,BIT_PACKED
      MANAGER_ID: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5550 SZ:168/363/2.16 VC:107 ENC:PLAIN,RLE,BIT_PACKED
      DEPARTMENT_ID: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5718 SZ:94/254/2.70 VC:107 ENC:PLAIN,RLE,BIT_PACKED

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              daveoshinsky Dave Oshinsky
              Rahul Kumar Challapalli Rahul Kumar Challapalli
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: