[HBASE-7645] put without timestamp duplicates the record/row - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Brainstorming
Status: Closed
Priority: Trivial
Resolution: Not A Problem
Affects Version/s: None
Fix Version/s: None
Component/s: Client
Labels:
None

Description

if I call a couple of times SQOOP on the same dataset, outputting to HBase,
I will end up with duplicated data...

hbase(main):030:0> get "dump_HKFAS.sales_order", "1", {COLUMN => "mysql:created_at", VERSIONS => 4}
COLUMN                             CELL                                                                                             
mysql:created_at                  timestamp=1358853505756, value=2011-12-21 18:07:38.0                                             
mysql:created_at                  timestamp=1358790515451, value=2011-12-21 18:07:38.0                                             
2 row(s) in 0.0040 seconds

today's sqoop run
hbase(main):031:0> Date.new(1358853505756).toString()
=> "Tue Jan 22 11:18:25 UTC 2013"

yesterday's sqoop run
hbase(main):032:0> Date.new(1358790515451).toString()
=> "Mon Jan 21 17:48:35 UTC 2013"

the fact that the Put.add() method writes the kv without checking if, apart of the timestamp, the value has not changed, is it by design? or a bug?

I mean, what's the idea behind? Shall it be SQOOP (the client application) supposed to handle the read on the value before issuing an add() statement call?

from: trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Put.java

  public Put add(byte [] family, byte [] qualifier, byte [] value) {
    return add(family, qualifier, this.ts, value);
  }

  public Put add(byte [] family, byte [] qualifier, long ts, byte [] value) {
    List<KeyValue> list = getKeyValueList(family);
    KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
    list.add(kv);
    familyMap.put(kv.getFamily(), list);
    return this;
  }

Attachments

Issue Links

relates to

SQOOP-834 duplicate of data exporting to hbase

Open

Activity

People

Assignee:: Unassigned

Reporter:: Guido Serra aka Zeph

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 22/Jan/13 17:15

Updated:: 16/Jun/22 01:18

Resolved:: 22/Jan/13 17:27