|
Here's some ideas for how this might work Billy.
HADOOP-1958 talks of making a table read-only. It also talks of being able to send a flush-and-compact command across a cluster/table so all in-memory entries are persisted followed by a compaction to tidy-up the on-disk representation. Jim is currently working on HADOOP-2478 which will move all to do with a particular table under a directory named for the table in hdfs. Hadoop has a copy files utility that can take a src in one fileystem and a target in the same or another filesystem and will run a mapreduce command to do a fast copy. Deploying the backup copy would run pretty much as you suggest only I'd imagine we'd have a tool that read the backed up table directory and per-region-found, did an insert into the catalog .META. table (Same tool run with a different option would purge a table from the catalog). Adjusting the priority down to Minor, as this is new functionality. Also setting Fix Version to 0.17, as we have a lot of stuff to get done before 0.16 as it is.
Other ideas. A command on the master would send a signal to all regionservers. They would dump their in-memory content and tell the master when done. They would then block until they got the all-clear from the master and take reads but no updates. Master would then do a listing of the current content of the filesystem and dump a file listing of all files. The all-files-listing could then be used as input for a discp job. Master would wait until it gets a prompt from the admin that the distcp was complete or it would give the all-clear after the dump of the catalog of all files and instead of file delete on compaction or region delete, instead, files would get a '.deleted' suffix. The running distcp, if it couldn't find the original file would look for the same file with the '.deleted' suffix and copy that instead.
Sort of like 'safe mode' in hdfs?
Ideal solution would be to exec this from hbase
example exec the snapshot command with the correct args 1. hbase turns the tables in to read-only mode I guess somewhere in there we should remember what tables where I've been thinking about this recently. I'd like to be able to take a
snapshot backup of all of our tables, and a requirement here is that the snapshot be consistent. What this mean with respect to transactions, is that either none or all of a transaction makes it into the snapshot (atomicity) Another requirement is to minimize the time we have to be read-only as My first thinking was along the lines of what Stack's suggested above: Then I realized that once we have proper appending to the So I was thinking it would work roughly like this: (I use the term
So how does this sound? It seems I can avoid the memcache flush if I I foresee problems with running compactions might have to get a compaction lock on all the regions before starting all of the above.
Say you flushed the logs and then ran a compaction and waited for the cluster to chill out. Unless you had extremely high churn rates I would suggest: A mapreduce with a region per task which fails and retries in case of flushes or compactions. In the case of a split you can fail the job, disable splitting, or have some way of getting the children later.
Even though you would have kindof data throughout a time period, you would at least have a timestamp of when that backup was made. I.E. consistency on a region level which is all a lot of us really want. |
||||||||||||||||||||||||||||||||||||||||||||||
For a fast load of the restore we could
stop serveing (disable) the table
delete current regions and meta data for the table
copy the backup regions in to the correct locations for hbase region serving
reload the backup meta data.
enable the table
On the next rescan of the master the new meta would be picked up and the master could start assigning the regions to regionservers this way no time is spend reloading the data.