Issue Details (XML | Word | Printable)

Key: HBASE-50
Type: New Feature New Feature
Status: In Progress In Progress
Priority: Minor Minor
Assignee: Alex Newman
Reporter: Billy Pearson
Votes: 1
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
Hadoop HBase

Snapshot of table

Created: 28/Dec/07 08:48 AM   Updated: 17/Nov/09 05:07 AM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified


 Description  « Hide
Havening an option to take a snapshot of a table would be vary useful in production.

What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code.

The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover.

I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Billy Pearson added a comment - 28/Dec/07 09:00 AM
Backup/snapshot should take each region as is and copy it to a folder and meta data for table should be backed up also with the snapshot.

For a fast load of the restore we could
stop serveing (disable) the table
delete current regions and meta data for the table
copy the backup regions in to the correct locations for hbase region serving
reload the backup meta data.
enable the table

On the next rescan of the master the new meta would be picked up and the master could start assigning the regions to regionservers this way no time is spend reloading the data.


stack added a comment - 29/Dec/07 10:16 PM
Here's some ideas for how this might work Billy.

HADOOP-1958 talks of making a table read-only. It also talks of being able to send a flush-and-compact command across a cluster/table so all in-memory entries are persisted followed by a compaction to tidy-up the on-disk representation. Jim is currently working on HADOOP-2478 which will move all to do with a particular table under a directory named for the table in hdfs. Hadoop has a copy files utility that can take a src in one fileystem and a target in the same or another filesystem and will run a mapreduce command to do a fast copy.

Deploying the backup copy would run pretty much as you suggest only I'd imagine we'd have a tool that read the backed up table directory and per-region-found, did an insert into the catalog .META. table (Same tool run with a different option would purge a table from the catalog).


Bryan Duxbury added a comment - 10/Jan/08 10:50 PM
Adjusting the priority down to Minor, as this is new functionality. Also setting Fix Version to 0.17, as we have a lot of stuff to get done before 0.16 as it is.

stack added a comment - 21/Mar/08 03:01 AM
Other ideas. A command on the master would send a signal to all regionservers. They would dump their in-memory content and tell the master when done. They would then block until they got the all-clear from the master and take reads but no updates. Master would then do a listing of the current content of the filesystem and dump a file listing of all files. The all-files-listing could then be used as input for a discp job. Master would wait until it gets a prompt from the admin that the distcp was complete or it would give the all-clear after the dump of the catalog of all files and instead of file delete on compaction or region delete, instead, files would get a '.deleted' suffix. The running distcp, if it couldn't find the original file would look for the same file with the '.deleted' suffix and copy that instead.

Jim Kellerman added a comment - 21/Mar/08 03:34 AM
Sort of like 'safe mode' in hdfs?

stack added a comment - 02/Sep/08 07:28 PM
HADOOP-3637 "Support for snapshots"

Billy Pearson added a comment - 02/Sep/08 08:04 PM
Ideal solution would be to exec this from hbase

example

exec the snapshot command with the correct args

1. hbase turns the tables in to read-only mode
2. we exec the snapshot stuff on hadoop for the hbase dir
3. Then we turn the tables back into read-write mode.

I guess somewhere in there we should remember what tables where
disabled and where in read-only mode before the snapshot started.


Clint Morgan added a comment - 03/Oct/08 08:09 PM
I've been thinking about this recently. I'd like to be able to take a
snapshot backup of all of our tables, and a requirement
here is that the snapshot be consistent. What this mean with respect
to transactions, is that either none or all of a transaction makes it
into the snapshot (atomicity)

Another requirement is to minimize the time we have to be read-only as
much as possible. I'd like to keep in on the order of a few seconds.

My first thinking was along the lines of what Stack's suggested above:
Go to read-only, flush, then copy the files. As I understand it, I
could go back to allowing writes as soon as the memcache flush
begins. The subsequent writes would just go to memory....

Then I realized that once we have proper appending to the
write-ahead-log (HLog), then I can simply copy that log over rather
than doing the memcahe flush.

So I was thinking it would work roughly like this: (I use the term
message generically here. Originally I was thinking this could all be
orchestrated by passing around HMsgs with the normal mechanism, but
now I think it would be better to do it with explicit RPC calls to
speed things up.)

  • Master sends RegionServers a BeginSnapshot message
  • RegionServers recieve BeginSnapshot and put thier regions into
    read-only mode, and prevent flushes/compactions/splits.
  • Commit-pending transactions (EG, transactions which we have voted to
    commit, but not committed yet) for a region are allowed to
    finish. This is needed to ensure atomicity. The time that
    transactions are commit-pending should be very small.
  • After all commit-pending transactions have completed, the Region
    move the write ahead logger to a new file. The old one(s) will be
    copied in the snapshot. When all regions in a RegionServer are
    ready, it sends a CopyOk message to the Master. This means that our
    hdfs files are ready to be copied.
  • After all RegionServers have sent the CopyOk message, the
    Master sends a WritesOk message to all regionServers, and begins the HDFS copy.
  • When Regions get the WritesOK message, they can allow writes to the
    memcache and new WAL. (If they need to spill to disk then we have to handle that
    specially. Either abort the snapshot, or spill to something that
    won't be included in the snapshot)
  • After the hdfs copy is done, then the Master sends a
    SnapshotComplete message. This tells the RegionServers that they can
    start spilling to disk again.

So how does this sound? It seems I can avoid the memcache flush if I
really trust my WAL. And it seems I should be able to keep the
read-only time fairly low. Any problems I'm not seeing?


Billy Pearson added a comment - 03/Oct/08 08:36 PM
I foresee problems with running compactions might have to get a compaction lock on all the regions before starting all of the above.

Alex Newman added a comment - 17/Nov/09 05:07 AM
Say you flushed the logs and then ran a compaction and waited for the cluster to chill out. Unless you had extremely high churn rates I would suggest: A mapreduce with a region per task which fails and retries in case of flushes or compactions. In the case of a split you can fail the job, disable splitting, or have some way of getting the children later.

Even though you would have kindof data throughout a time period, you would at least have a timestamp of when that backup was made. I.E. consistency on a region level which is all a lot of us really want.