[PHOENIX-2521] Support duplicate rows in CSV Bulk Loader - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 4.5.2
Fix Version/s: None
Component/s: None
Labels:
None

Description

found out the map reduce csv bulk load tool doesn't behave the same as UPSERTs. Is it by design or a bug?

Here is the queries for creating table and index:

 CREATE TABLE mySchema.mainTable (
id varchar NOT NULL,
name varchar,
address varchar
CONSTRAINT pk PRIMARY KEY (id));

 CREATE INDEX myIndex 
ON mySchema.mainTable  (name, id) 
INCLUDE (address);

if I execute two upserts where the second one update the name (which is the key for index), everything works fine (the record will be updated in both table and index table)

 UPSERT INTO mySchema.mainTable (id, name, address) values ('1', 'john', 'Montreal');

UPSERT INTO mySchema.mainTable (id, name, address) values ('1', 'jack', 'Montreal');

SELECT /*+ INDEX(mySchema.mainTable myIndex) */ * from mySchema.mainTable where name = 'jack';

==> one record

SELECT /*+ INDEX(mySchema.mainTable myIndex) */ * from mySchema.mainTable where name = 'john';

==> zero records

But, if I load the date using org.apache.phoenix.mapreduce.CsvBulkLoadTool to the main table, it behaves different. The main table will be updated, but the new record will be appended to the index table:

HADOOP_CLASSPATH=/usr/lib/hbase/lib/hbase-protocol-1.1.2.jar:/etc/hbase/conf hadoop jar /usr/lib/hbase/phoenix-4.5.2-HBase-1.1-bin/phoenix-4.5.2-HBase-1.1-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool -d',' -s mySchema -t mainTable -i /tmp/input.txt

input.txt:
2,tomas,montreal
2,george,montreal

(I have tried it both with/without -it and got the same result)

SELECT /*+ INDEX(mySchema.mainTable myIndex) */ * from mySchema.mainTable where name = 'tomas'

==> one record;

 SELECT /*+ INDEX(mySchema.mainTable myIndex) */ * from mySchema.mainTable where name = 'george'

==> one record;

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Afshin Moazami

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 13/Dec/15 14:52

Updated:: 23/Jan/16 20:01