Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-1010

Data loaded differently in Hive & Carbondata because of that comparision failure occurs on Select * statement

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Trivial
    • Resolution: Fixed
    • 1.1.0
    • None
    • None
    • None
    • Spark1.6

    Description

      carbondata:

      CREATE TABLE uniqdata_char (CUST_ID int,CUST_NAME char(30),ACTIVE_EMUI_VERSION char(30), DOB timestamp,
      DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10),
      DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,
      INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format'
      TBLPROPERTIES ("TABLE_BLOCKSIZE"= "256 MB")

      LOAD DATA INPATH 'hdfs://192.168.2.145:54310/BabuStore/Data/uniqdata/2000_UniqData.csv' into table uniqdata_char OPTIONS('DELIMITER'=',' ,'QUOTECHAR'='"','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')

      0: jdbc:hive2://hadoop-master:10000> ;
      ---------+

      Result

      ---------+
      ---------+
      No rows selected (3.988 seconds)
      0: jdbc:hive2://hadoop-master:10000> select CUST_NAME from uniqdata_char;
      ------------------+

      CUST_NAME

      ------------------+

       
       
       
       
       
       
       
       
       
       
       
       
      CUST_NAME_00000
      CUST_NAME_00000
      CUST_NAME_00001
      CUST_NAME_00002
      CUST_NAME_00003
      CUST_NAME_00004
      CUST_NAME_00005
      CUST_NAME_00006
      CUST_NAME_00007
      CUST_NAME_00008
      CUST_NAME_00009
      CUST_NAME_00010

      Hive:

      CREATE TABLE uniqdata_char (CUST_ID int,CUST_NAME char(30),ACTIVE_EMUI_VERSION char(30), DOB timestamp,
      DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10),
      DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,
      INTEGER_COLUMN1 int)ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
      ---------+

      result

      ---------+
      ---------+
      No rows selected (0.093 seconds)
      0: jdbc:hive2://hadoop-master:10000> LOAD DATA LOCAL INPATH '/opt/Carbon/CarbonData/TestData/Data/uniqdata/2000_UniqData.csv' into table uniqdata_char ;
      ---------+

      Result

      ---------+
      ---------+
      No rows selected (0.228 seconds)
      0: jdbc:hive2://hadoop-master:10000> select CUST_NAME from uniqdata_char ;
      ---------------------------------+

      CUST_NAME

      ---------------------------------+

       
       
      CUST_NAME_00000
       
       
       
       
       
       
       
       
       
       
      CUST_NAME_00000
      CUST_NAME_00001
      CUST_NAME_00002
      CUST_NAME_00003
      CUST_NAME_00004
      CUST_NAME_00005
      CUST_NAME_00006
      CUST_NAME_00007
      CUST_NAME_00008
      CUST_NAME_00009
      CUST_NAME_00010

      There is a data mismatch in select query. In Carbondata all the empty space is displayed first where as the same is not performed in hive. Because odf that comparison failures occurs while executing query in automation framework.

      Attachments

        1. 2000_UniqData.csv
          367 kB
          SWATI RAO

        Issue Links

          Activity

            People

              Unassigned Unassigned
              swati.rao SWATI RAO
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: