|
[
Permlink
| « Hide
]
Dan Zinngrabe added a comment - 22/Sep/08 11:03 PM
Unzip and run 'ant build' to create the binary. Documentation is included in the readme. Note that while this has primarily been used with 0.1.2 and 0.1.3, it should be usable on newer versions with little or no modifications.
This looks excellent Dan.
I tried to build it but the lib dir is empty; my guess is its supposed to be populated with some subset of hadoop and hbase jars and lib content. But more important, do you think we should bundle this tool with hbase itself? What would you suggest? Perhaps add it as subpackage under hbase mapred? The README could be redone as package-level documentation? Or do you think it better it remain its own self-contained thing? If so, where should it live other than as a JIRA attachment? Good stuff (Just back from Hawaii so I know what Mahalo means now). Yes, you need the hbase and hadoop jars either in the lib directory or on your classpath for it to build properly.
This hasn't been tested with the most recent HBase and Hadoop releases but there is no reason I can find that it would not work other than class name changes. I think including it in hbase may be a good idea - being able to export and import data even just for testing purposes is valuable to developers, and the backup capability is something people have asked for quite a bit. Until there is a more robust backup tool like what has been suggested for HBASE-50, this would certainly be a reasonable stopgap. Since for backup purposes the tool is likely to be deployed and used by systems administrator, the README should probably remain separate for now - it makes it easier to get it in their hands. I'll give that a shot, it shouldn't present any problems that I can see.
I'll put most of the readme into the package docs, and I'll see if I can do a version of it targetted at sysadmins for the wiki. Dan: Thanks. One minor thing, rather than put the doc. in wiki, if its in the javadoc, it can evolve along with the hbase versions. Also, to be clear, you've run this MR job against a 'live' instance (Just asking. Someone off-the-mailing-list was looking for such a thing and I pointed them here). Finally, any chance of adding mahola to the powered-by page. I'm making the rounds trying to get fellas to add their names; its empty now and that gives off a bad impression. Good stuff.
That's correct: look at www.mahalo.com . All the markup that power the wiki is stored in HBase, and backed up using this tool every hour. It's been in use for a few months now. MediaWiki - same software that power Wikipedia - has version/revision control. Mahalo's in-house editors produce a lot of revisions per day, which was not working well in a RDBMS. An hbase-based solution for this was built and tested, and the data migrated out of MySQL using this tool (and a few python scripts) and into HBase. Right now it's at something like 6 million items in HBase. The tool runs every hour from a shell script to back up that data, and on 6 nodes takes about 5-10 minutes to run - and does not slow down production at all. So its not just a backup, it's a hot backup.
Mahalo has no problem with being added to the powered-by page Dan Zinngrabe.
Thanks for your wonderful tool. It's really very helpful, In the jira, it's recorded as affect the hbase 0.1.2, 0.1.3. Now hbase 0.19.1 is release and i found that it's not compatible with that. I want to know that do you have the new version which will fix the problems? Thanks. Good stuff, atppp. FYI there was another issue opened for an 0.19 version
I'm working on a new one for 0.20 soon, will have it up next week in a new issue and will post here. Issue contains tools to perform this on really old versions and also on 0.19. No plans to commit any of this into branches.
Other implementations for 0.18/0.19 available in Closing issue as Won't Fix. 0.20 backup now being worked on in |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||