Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-7645

put without timestamp duplicates the record/row

    XMLWordPrintableJSON

Details

    • Brainstorming
    • Status: Closed
    • Trivial
    • Resolution: Not A Problem
    • None
    • None
    • Client
    • None

    Description

      if I call a couple of times SQOOP on the same dataset, outputting to HBase,
      I will end up with duplicated data...

      hbase(main):030:0> get "dump_HKFAS.sales_order", "1", {COLUMN => "mysql:created_at", VERSIONS => 4}
      COLUMN                             CELL                                                                                             
      mysql:created_at                  timestamp=1358853505756, value=2011-12-21 18:07:38.0                                             
      mysql:created_at                  timestamp=1358790515451, value=2011-12-21 18:07:38.0                                             
      2 row(s) in 0.0040 seconds
      
      today's sqoop run
      hbase(main):031:0> Date.new(1358853505756).toString()
      => "Tue Jan 22 11:18:25 UTC 2013"
      
      yesterday's sqoop run
      hbase(main):032:0> Date.new(1358790515451).toString()
      => "Mon Jan 21 17:48:35 UTC 2013"
      

      the fact that the Put.add() method writes the kv without checking if, apart of the timestamp, the value has not changed, is it by design? or a bug?

      I mean, what's the idea behind? Shall it be SQOOP (the client application) supposed to handle the read on the value before issuing an add() statement call?

      from: trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Put.java

        public Put add(byte [] family, byte [] qualifier, byte [] value) {
          return add(family, qualifier, this.ts, value);
        }
      
        public Put add(byte [] family, byte [] qualifier, long ts, byte [] value) {
          List<KeyValue> list = getKeyValueList(family);
          KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
          list.add(kv);
          familyMap.put(kv.getFamily(), list);
          return this;
        }
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              zeph Guido Serra aka Zeph
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: