Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-13639

SyncTable - rsync for HBase tables

    XMLWordPrintableJSON

Details

    • Reviewed
    • Hide
      Tool to sync two tables that tries to send the differences only like rsync.

      Adds two new MapReduce jobs, SyncTable and HashTable. See usage for these jobs on how to use. See design doc for generally overview: https://docs.google.com/document/d/1-2c9kJEWNrXf5V4q_wBcoIXfdchN7Pxvxv1IO6PW0-U/edit

      From comments below, "It can be challenging to run against a table getting live writes, if those writes are updates/overwrites. In general, you can run it against a time range to ignore new writes, but if those writes update existing cells, then the time range scan may or may not see older versions of those cells depending on whether major compaction has happened, which may be different in remote clusters."
      Show
      Tool to sync two tables that tries to send the differences only like rsync. Adds two new MapReduce jobs, SyncTable and HashTable. See usage for these jobs on how to use. See design doc for generally overview: https://docs.google.com/document/d/1-2c9kJEWNrXf5V4q_wBcoIXfdchN7Pxvxv1IO6PW0-U/edit From comments below, "It can be challenging to run against a table getting live writes, if those writes are updates/overwrites. In general, you can run it against a time range to ignore new writes, but if those writes update existing cells, then the time range scan may or may not see older versions of those cells depending on whether major compaction has happened, which may be different in remote clusters."

    Description

      Given HBase tables in remote clusters with similar but not identical data, efficiently update a target table such that the data in question is identical to a source table. Efficiency in this context means using far less network traffic than would be required to ship all the data from one cluster to the other. Takes inspiration from rsync.

      Design doc: https://docs.google.com/document/d/1-2c9kJEWNrXf5V4q_wBcoIXfdchN7Pxvxv1IO6PW0-U/

      Attachments

        1. HBASE-13639.patch
          79 kB
          Dave Latham
        2. HBASE-13639-0.98.patch
          85 kB
          Andrew Kyle Purtell
        3. HBASE-13639-0.98-addendum-hadoop-1.patch
          8 kB
          Andrew Kyle Purtell
        4. HBASE-13639-v1.patch
          79 kB
          Dave Latham
        5. HBASE-13639-v2.patch
          81 kB
          Dave Latham
        6. HBASE-13639-v3.patch
          85 kB
          Dave Latham
        7. HBASE-13639-v3-0.98.patch
          89 kB
          Andrew Kyle Purtell

        Activity

          People

            davelatham Dave Latham
            davelatham Dave Latham
            Votes:
            0 Vote for this issue
            Watchers:
            19 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: