Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-1703

Difference in result set count of carbon and hive during select query with Null values in IN expression

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.3.0
    • None
    • data-query
    • None
    • spark 2.1

    Description

      Incorrect result displays after applying select query.

      Steps to reproduce:
      1) Create table stored by carbondata and load data in it:

      a) CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ("TABLE_BLOCKSIZE"= "256 MB");

      b) LOAD DATA INPATH 'hdfs://localhost:54310/Data/uniqdata/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');

      2) Create hive table:
      a) CREATE TABLE uniqdata_h (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

      b) load data local inpath '/home/knoldus/Desktop/csv/TestData/Data/uniqdata/2000_UniqData.csv' into table uniqdata_h;

      3) Execute Query:
      a) SELECT CUST_ID,CUST_NAME,DOB,BIGINT_COLUMN1,DECIMAL_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN2 from (select * from uniqdata) SUB_QRY WHERE (CUST_ID in (10020,10030,10032,10035,10040,10060,NULL) or INTEGER_COLUMN1 not in (1021,1031,1032,1033,NULL)) and (Double_COLUMN1 not in (1.12345674897976E10,NULL) or DECIMAL_COLUMN2 in (22345679921.1234000000,NULL));

      b) SELECT CUST_ID,CUST_NAME,DOB,BIGINT_COLUMN1,DECIMAL_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN2 from (select * from uniqdata_h) SUB_QRY WHERE (CUST_ID in (10020,10030,10032,10035,10040,10060,NULL) or INTEGER_COLUMN1 not in (1021,1031,1032,1033,NULL)) and (Double_COLUMN1 not in (1.12345674897976E10,NULL) or DECIMAL_COLUMN2 in (22345679921.1234000000,NULL));

      4) Expected Result: both results should be same.

      5) Actual Result:

      a) carbondata table result:
      ----------------------------------------------+

      CUST_ID CUST_NAME DOB BIGINT_COLUMN1 DECIMAL_COLUMN1 Double_COLUMN2 INTEGER_COLUMN1 DECIMAL_COLUMN2 Double_COLUMN2

      -----------------------------------------------------------------------------------------------------------------------------------------------------------------------+

      NULL   NULL NULL NULL NULL NULL NULL NULL
      NULL   NULL 1233720368578 NULL NULL NULL NULL NULL
      NULL   NULL NULL NULL NULL NULL NULL NULL
      NULL   NULL NULL 12345678901.1234000000 NULL NULL NULL NULL
      NULL   NULL NULL NULL NULL NULL NULL NULL
      NULL   NULL NULL NULL -1.12345674897976E10 NULL NULL -1.12345674897976E10
      NULL   NULL NULL NULL NULL 0 NULL NULL
      NULL   NULL NULL NULL NULL NULL NULL NULL
      NULL   1970-01-01 11:00:03.0 NULL NULL NULL NULL NULL NULL
      NULL   NULL NULL NULL NULL NULL NULL NULL
      NULL CUST_NAME_00000 NULL NULL NULL NULL NULL NULL NULL
      10020 CUST_NAME_01020 1972-10-17 01:00:03.0 123372037874 12345679921.1234000000 -1.12345674897976E10 1021 22345679921.1234000000 -1.12345674897976E10

      -----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      12 rows selected (1.391 seconds)

      b) hive table result:
      ----------------------------------------------+

      CUST_ID CUST_NAME DOB BIGINT_COLUMN1 DECIMAL_COLUMN1 Double_COLUMN2 INTEGER_COLUMN1 DECIMAL_COLUMN2 Double_COLUMN2

      -----------------------------------------------------------------------------------------------------------------------------------------------------------------------+

      10020 CUST_NAME_01020 1972-10-17 01:00:03.0 123372037874 12345679921.1234000000 -1.12345674897976E10 1021 22345679921.1234000000 -1.12345674897976E10

      -----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      1 row selected (0.408 seconds)

      Attachments

        1. 2000_UniqData.csv
          367 kB
          Vandana Yadav

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Vandana7 Vandana Yadav
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 9h 10m
                  9h 10m