Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-20305

Add option to SyncTable that skip deletes on target cluster



    • Hadoop Flags:


      We had a situation where two clusters with active-active replication got out of sync, but both had data that should be kept. The tables in question never have data deleted, but ingestion had happened on the two different clusters, some rows had been even updated.

      In this scenario, a cell that is present in one of the table clusters should not be deleted, but replayed on the other. Also, for cells with same identifier but different values, the most recent value should be kept. Current version of SyncTable would not be applicable here, because it would simply copy the whole state from source to target, then losing any additional rows that might be only in target, as well as cell values that got most recent update. This could be solved by adding an option to skip deletes for SyncTable. This way, the additional cells not present on source would still be kept. For cells with same identifier but different values, it would just perform a Put for the cell version from source, but client scans would still fetch the most recent timestamp.

      I'm attaching a patch with this additional option shortly. Please share your thoughts.




        1. 0001-HBASE-20305.master.001.patch
          10 kB
          Wellington Chevreuil
        2. HBASE-20305.branch-1.001.patch
          21 kB
          Wellington Chevreuil
        3. HBASE-20305.branch-2.001.patch
          17 kB
          Wellington Chevreuil
        4. HBASE-20305.master.002.patch
          17 kB
          Wellington Chevreuil



            • Assignee:
              wchevreuil Wellington Chevreuil
              wchevreuil Wellington Chevreuil
            • Votes:
              0 Vote for this issue
              8 Start watching this issue


              • Created: