Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-6364

CSVBulkload will cause duplicate data to be queried when a global index is created for each field in the table.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Bug
    • 4.13.1, 5.0.0
    • None
    • core
    • None

    Description

      HBase version :1.3.1

      phoenix version: apache-phoenix-4.13.0-HBase-1.3

       (download from http://phoenix.apache.org/download.html)

      phoneinx client version: apache-phoenix-4.13.0-HBase-1.3

        (download from http://phoenix.apache.org/download.html)

      step 1:create table

      0: jdbc:phoenix> create table testtable3(
      . . . . . . . .> DATE varchar not null,
      . . . . . . . .> NUM integer not null,
      . . . . . . . .> SEQ_NUM integer not null,
      . . . . . . . .> ACCOUNT1 varchar not null,
      . . . . . . . .> ACCOUNTDES varchar,
      . . . . . . . .> FLAG varchar,
      . . . . . . . .> SALL DOUBLE,
      . . . . . . . .> CONSTRAINT PK PRIMARY KEY (DATE,NUM,SEQ_NUM,ACCOUNT1)
      . . . . . . . .> );

      step 2: upsert data with primary key 

       UPSERT INTO testtable3 (DATE,NUM,SEQ_NUM,ACCOUNT1,ACCOUNTDES,FLAG,SALL) values ('20201001',30201001,13,'367392332','sffa1','','');
      UPSERT INTO testtable3 (DATE,NUM,SEQ_NUM,ACCOUNT1,ACCOUNTDES,FLAG,SALL) values ('20201002',30201002,14,'367392333','sffa2','','');
      UPSERT INTO testtable3 (DATE,NUM,SEQ_NUM,ACCOUNT1,ACCOUNTDES,FLAG,SALL) values ('20201003',30201003,15,'367392334','sffa3','','');
      UPSERT INTO testtable3 (DATE,NUM,SEQ_NUM,ACCOUNT1,ACCOUNTDES,FLAG,SALL) values ('20201004',30201004,16,'367392335','sffa4','','');
      UPSERT INTO testtable3 (DATE,NUM,SEQ_NUM,ACCOUNT1,ACCOUNTDES,FLAG,SALL) values ('20201005',30201005,17,'367392336','sffa5','','');
      UPSERT INTO testtable3 (DATE,NUM,SEQ_NUM,ACCOUNT1,ACCOUNTDES,FLAG,SALL) values ('20201006',30201006,18,'367392337','sffa6','','');
      UPSERT INTO testtable3 (DATE,NUM,SEQ_NUM,ACCOUNT1,ACCOUNTDES,FLAG,SALL) values ('20201007',30201007,19,'367392338','sffa7','','');
      UPSERT INTO testtable3 (DATE,NUM,SEQ_NUM,ACCOUNT1,ACCOUNTDES,FLAG,SALL) values ('20201008',30201008,20,'367392339','sffa8','','');
      UPSERT INTO testtable3 (DATE,NUM,SEQ_NUM,ACCOUNT1,ACCOUNTDES,FLAG,SALL) values ('20201009',30201009,21,'367392340','sffa9','','');
      UPSERT INTO testtable3 (DATE,NUM,SEQ_NUM,ACCOUNT1,ACCOUNTDES,FLAG,SALL) values ('20201010',30201010,22,'367392341','sffa10','','');

       step 3: create global index ,more than primary key

       CREATE INDEX testtable3_ID ON testtable3 (ACCOUNT1,DATE,NUM,ACCOUNTDES,SEQ_NUM);

       step 3: CSVBulkload  data,primary key same as before but other filed different

      20201001 30201001 13 367392332 sffa2 1231243 23
      20201002 30201002 14 367392333 sffa3 1231244 24
      20201003 30201003 15 367392334 sffa4 1231245 25
      20201004 30201004 16 367392335 sffa5 1231246 26
      20201005 30201005 17 367392336 sffa6 1231247 27
      20201006 30201006 18 367392337 sffa7 1231248 28
      20201007 30201007 19 367392338 sffa8 1231249 29
      20201008 30201008 20 367392339 sffa9 1231250 30
      20201009 30201009 21 367392340 sffa10 1231251 31
      20201010 30201010 22 367392341 sffa11 1231252 32

      step 4:select data 

      select DATE,NUM,SEQ_NUM,ACCOUNT1 from testtable3;
      -------------------+---------------------

      DATE NUM SEQ_NUM ACCOUNT1

      -------------------+---------------------

      20201001 20201001 13 367392332
      20201001 30201001 13 367392332
      20201001 30201001 13 367392332
      20201002 30201002 14 367392333
      20201002 30201002 14 367392333
      20201003 30201003 15 367392334
      20201003 30201003 15 367392334
      20201004 30201004 16 367392335
      20201004 30201004 16 367392335
      20201005 30201005 17 367392336
      20201005 30201005 17 367392336
      20201006 30201006 18 367392337
      20201006 30201006 18 367392337
      20201007 30201007 19 367392338
      20201007 30201007 19 367392338
      20201008 30201008 20 367392339
      20201008 30201008 20 367392339
      20201009 30201009 21 367392340
      20201009 30201009 21 367392340
      20201010 30201010 22 367392341
      20201010 30201010 22 367392341

      -------------------+---------------------

      and we can see index data :

      0: jdbc:phoenix> select * from testtable3_ID;
      2021-02-04 19:50:48,685 | INFO | hconnection-0x3943a2be-shared--pool1-t352 | RPC Server Kerberos principal name for service=ClientService is hbase/hadoop.hadoop.com@HADOOP.COM | org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.processPreambleResponse(RpcClientImpl.java:824)
      2021-02-04 19:50:48,699 | INFO | hconnection-0x3943a2be-shared--pool1-t353 | RPC Server Kerberos principal name for service=ClientService is hbase/hadoop.hadoop.com@HADOOP.COM | org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.processPreambleResponse(RpcClientImpl.java:824)
      --------------------+------------------------+----------

      :ACCOUNT1 :DATE :NUM 0:ACCOUNTDES :SEQ_NUM

      --------------------+------------------------+----------

      367392332 20201001 20201001 sffa1 13
      367392332 20201001 30201001 sffa1 13
      367392332 20201001 30201001 sffa2 13
      367392333 20201002 30201002 sffa2 14
      367392333 20201002 30201002 sffa3 14
      367392334 20201003 30201003 sffa3 15
      367392334 20201003 30201003 sffa4 15
      367392335 20201004 30201004 sffa4 16
      367392335 20201004 30201004 sffa5 16
      367392336 20201005 30201005 sffa5 17
      367392336 20201005 30201005 sffa6 17
      367392337 20201006 30201006 sffa6 18
      367392337 20201006 30201006 sffa7 18
      367392338 20201007 30201007 sffa7 19
      367392338 20201007 30201007 sffa8 19
      367392339 20201008 30201008 sffa8 20
      367392339 20201008 30201008 sffa9 20
      367392340 20201009 30201009 sffa10 21
      367392340 20201009 30201009 sffa9 21
      367392341 20201010 30201010 sffa10 22
      367392341 20201010 30201010 sffa11 22

      --------------------+------------------------+----------

      or scan with hbase shell:

      hbase(main):007:0> scan "TESTTABLE3_ID"
      ROW COLUMN+CELL
      2021-02-04 20:20:40,157 | INFO | hconnection-0x713a35c5-shared--pool1-t3 | RPC Server Kerberos principal name for service=ClientService is hbase/hadoop.hadoop.com@HADOOP.COM | org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.processPreambleResponse(RpcClientImpl.java:824)
      367392332\x0020201001\x00\x814>)sffa1\x00\x80\x00\x00\x0D column=0:_0, timestamp=1612438808215, value=x
      367392332\x0020201001\x00\x81\xCC\xD4\xA9sffa1\x00\x80\x00\ column=0:_0, timestamp=1612438808215, value=x
      x00\x0D
      367392332\x0020201001\x00\x81\xCC\xD4\xA9sffa2\x00\x80\x00\ column=0:_0, timestamp=1612439432910, value=_0
      x00\x0D
      367392333\x0020201002\x00\x81\xCC\xD4\xAAsffa2\x00\x80\x00\ column=0:_0, timestamp=1612438808215, value=x
      x00\x0E
      367392333\x0020201002\x00\x81\xCC\xD4\xAAsffa3\x00\x80\x00\ column=0:_0, timestamp=1612439432910, value=_0
      x00\x0E
      367392334\x0020201003\x00\x81\xCC\xD4\xABsffa3\x00\x80\x00\ column=0:_0, timestamp=1612438808215, value=x
      x00\x0F
      367392334\x0020201003\x00\x81\xCC\xD4\xABsffa4\x00\x80\x00\ column=0:_0, timestamp=1612439432910, value=_0
      x00\x0F
      367392335\x0020201004\x00\x81\xCC\xD4\xACsffa4\x00\x80\x00\ column=0:_0, timestamp=1612438808215, value=x
      x00\x10
      367392335\x0020201004\x00\x81\xCC\xD4\xACsffa5\x00\x80\x00\ column=0:_0, timestamp=1612439432910, value=_0
      x00\x10
      367392336\x0020201005\x00\x81\xCC\xD4\xADsffa5\x00\x80\x00\ column=0:_0, timestamp=1612438808215, value=x
      x00\x11
      367392336\x0020201005\x00\x81\xCC\xD4\xADsffa6\x00\x80\x00\ column=0:_0, timestamp=1612439432910, value=_0
      x00\x11
      367392337\x0020201006\x00\x81\xCC\xD4\xAEsffa6\x00\x80\x00\ column=0:_0, timestamp=1612438808215, value=x
      x00\x12
      367392337\x0020201006\x00\x81\xCC\xD4\xAEsffa7\x00\x80\x00\ column=0:_0, timestamp=1612439432910, value=_0
      x00\x12
      367392338\x0020201007\x00\x81\xCC\xD4\xAFsffa7\x00\x80\x00\ column=0:_0, timestamp=1612438808215, value=x
      x00\x13
      367392338\x0020201007\x00\x81\xCC\xD4\xAFsffa8\x00\x80\x00\ column=0:_0, timestamp=1612439432910, value=_0
      x00\x13
      367392339\x0020201008\x00\x81\xCC\xD4\xB0sffa8\x00\x80\x00\ column=0:_0, timestamp=1612438808215, value=x
      x00\x14
      367392339\x0020201008\x00\x81\xCC\xD4\xB0sffa9\x00\x80\x00\ column=0:_0, timestamp=1612439432910, value=_0
      x00\x14
      367392340\x0020201009\x00\x81\xCC\xD4\xB1sffa10\x00\x80\x00 column=0:_0, timestamp=1612439432910, value=_0
      \x00\x15
      367392340\x0020201009\x00\x81\xCC\xD4\xB1sffa9\x00\x80\x00\ column=0:_0, timestamp=1612438808215, value=x
      x00\x15
      367392341\x0020201010\x00\x81\xCC\xD4\xB2sffa10\x00\x80\x00 column=0:_0, timestamp=1612438808215, value=x
      \x00\x16
      367392341\x0020201010\x00\x81\xCC\xD4\xB2sffa11\x00\x80\x00 column=0:_0, timestamp=1612439432910, value=_0
      \x00\x16

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            ShuangQi XiaShuangQi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: