Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-834

duplicate of data exporting to hbase

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      calling the HBASE Put.add() statement on an unchanged (previously inserted) row/value
      will cause a data duplication (only the timestamp associated will be incremented)

      hbase(main):030:0> get "dump_HKFAS.sales_order", "1", {COLUMN => "mysql:created_at", VERSIONS => 4}
      COLUMN                             CELL                                                                                             
      mysql:created_at                  timestamp=1358853505756, value=2011-12-21 18:07:38.0                                             
      mysql:created_at                  timestamp=1358790515451, value=2011-12-21 18:07:38.0                                             
      2 row(s) in 0.0040 seconds
      

      today's sqoop run

      hbase(main):031:0> Date.new(1358853505756).toString()
      => "Tue Jan 22 11:18:25 UTC 2013"
      

      yesterday's sqoop run

      hbase(main):032:0> Date.new(1358790515451).toString()
      => "Mon Jan 21 17:48:35 UTC 2013"
      

      I did verified that this is a desired behavior on server side, according to HBASE-7645

      I'd expect instead that a rerun of SQOOP would not cause a reversioning of all rows in the tables in HBase, but just an update of the changed fields

      Attachments

        Issue Links

          Activity

            People

              zeph Guido Serra aka Zeph
              zeph Guido Serra aka Zeph
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: