Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26323

Introduce a SnapshotProcedure

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.6.0, 3.0.0-alpha-3
    • proc-v2, snapshots
    • None
    • Reviewed
    • Introduce a snapshot procedure to snapshot tables. Set hbase.snapshot.procedure.enabled to false to disable snapshot procedure.

    Description

      Currently,snapshot in hbase uses zk as coordinator. It has some limitations, 
      a. Snapshot maybe fails when there are region server crashes.
      b. Snapshot maybe failed when master restarts.
      c. Only one snapshot per table can be taken in a time.
      d. Snapshot verify will be handled by master, which may take long time when our table has a large number of regions, for example 10000.

       

      Since we have procedure v2 framework now, it is possible to solve the above problems. So here is a procedure2-based snapshot implementation. It has some goals,
      a. Snapshot can continue when there are region server crashes.
      b. Snapshot can continue when master restarts.
      c. More than one snapshot per table can be taken in a time.
      d. We can use region servers to verify snapshot to accelerate procedure.

       

      Here are some details about implementation.
      SnapshotProcedure
      SnapshotProcedure is used to take snapshot on a table. It acquires shared table lock on the snapshot table and hold the shared lock during suspend and yield. 
      SnapshotRegionProcedure
      SnapshotRegionProcedure is used to take snapshot on a specific region of the snapshot table. It acquires exclusive region lock and releases lock during suspend and yield. Before dispatch remote snapshot operations to region server, it will check target region in RIT or not. If target region is in RIT, it will sleep some time and retry.
      SnapshotVerifyProcedure
      SnapshotVerifyProcedure is used to send snapshot verify request to region server. If snapshot is corrupted, it will notify parent snapshot to retry. When remote region server is crashed, it will choose another online server and retry.

       

      I would be very grateful for any advice and guidance. Is anyone interested in taking a look?

      Attachments

        Issue Links

          Activity

            People

              frostruan ruanhui
              frostruan ruanhui
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: