Issue Details (XML | Word | Printable)

Key: HBASE-974
Type: New Feature New Feature
Status: Resolved Resolved
Resolution: Won't Fix
Priority: Minor Minor
Assignee: Erik Holstad
Reporter: Jonathan Gray
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop HBase

New Backup Export/Import MR for 0.18/0.19

Created: 31/Oct/08 04:45 PM   Updated: 22/Jul/09 10:06 PM
Return to search
Component/s: None
Affects Version/s: 0.18.0, 0.18.1, 0.19.0
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Java Source File Licensed for inclusion in ASF works ExportMR.java 2009-01-24 01:25 AM Erik Holstad 5 kB
File Licensed for inclusion in ASF works ExportMR.sh 2009-01-24 01:26 AM Erik Holstad 1 kB
Java Source File Licensed for inclusion in ASF works ImportMR.java 2009-01-24 01:25 AM Erik Holstad 6 kB
File Licensed for inclusion in ASF works ImportMR.sh 2009-01-24 01:26 AM Erik Holstad 1 kB
File Licensed for inclusion in ASF works makeExportJar.sh 2009-01-24 01:26 AM Erik Holstad 0.2 kB
File Licensed for inclusion in ASF works makeImportJar.sh 2009-02-06 05:48 PM Erik Holstad 0.2 kB

Resolution Date: 22/Jul/09 10:06 PM


 Description  « Hide
Erik has built a new backup tool that's compatible with newer versions of HBase.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
sishen added a comment - 03/Nov/08 09:24 AM
I have patched Dan's previous work on this tool to work with 0.18/0.19.

However, looking forward to see this effort.


Erik Holstad added a comment - 24/Jan/09 01:25 AM
MapReduce files for importing and exporting data from HBase tables.
Extra scripts used to create the jars and execute the jobs are attached for reference.

Erik Holstad added a comment - 06/Feb/09 05:48 PM
Removed the dependency to HBaseRef since it is not needed.

stack added a comment - 07/Feb/09 08:50 PM
Erik:

Would suggest that when you are successful with Yair, that you get him to +1 one this issue. On the ExportMR job, I wonder if its possible to set maps == 0 so you don't have to supply a map task? Should we commit these classes to hbase? Into a contrib or under examples? I like the way they serialize RR. Good idea. If we're going to commit, they need apache licenses and the style fixed up (don't ask jgray – he'll only tell you wrong thing.... smile). For the below, check Writables in hbase util. I think there are methods there to help you do the below:

ByteArrayInputStream bis = new ByteArrayInputStream( ((BytesWritable)val).get() );// baos.toByteArray());
		DataInputStream dis = new DataInputStream(bis);
		RowResult rowRes = new RowResult();
		rowRes.readFields(dis);

If you use HbaseMapWritable instead of MW, you could do without Text and toString'ing table name (I think). In 880, I believe RowResult and BatchUpdate have same ancestor. Would be sweet if they could be used interchangeably so you wouldn't need to do the conversion in rowResultToBatchUpdate. You think it makes sense creating the new HTable in the reduce each time its invoked and not in its configure step?


Erik Holstad added a comment - 07/Feb/09 10:17 PM
Hey Stack!
I did some changes to the files so that they are looking better according to the standards and are using the
util.Writables instead.
Haven't tested the functionality yet, so will post the new code when I have, so you can comment on it again
Not really sure where it fits best, but would say examples for now.
Don't want to spend too much time fixing the code though, since I think that we will have more efficient ways
of doing the backup in a little, when we start messing with Cascading for example, but will see what Yair says
and after that post the updated code.

Erik Holstad added a comment - 12/Feb/09 06:55 AM
First I didn't really understand why I created I new HTable in every reducer, but today it
struck me, that we had it setup in another way. We have kinds like a pool of tables that
you check in and out, but it has dependencies so that is why I removed it. Of course it
doesn't make any sense to have it the way it is now, it just slows things down a lot.

atppp added a comment - 20/Jul/09 04:19 AM
very interesting idea of directly serializing RR. However, in importer reducer, as you said, you could create new HTable in configure(), but you don't even have to do that. You can just directly let output collect (key, batchUpdate) and TableReduce would take care of committing. Plus TableReduce sets autoflush off which significantly boosts importing performance.

see original example in HBASE-897 and my recent changes. (should mark 897 duplicate or close it?)


Jonathan Gray added a comment - 20/Jul/09 04:13 PM
TableReduce has had some performance issues in the past. I think it's pretty good now though.

Yes, you'll definitely want to turn off autoflush. Honestly I don't think that option existed when these jobs were written

I will close these issues once I open an 0.20 issue.


Jonathan Gray added a comment - 22/Jul/09 10:06 PM
Issue contains tools to perform this on 0.18 and 0.19. No plans to commit any of this into branches.

Closing issue as Won't Fix. 0.20 backup now being worked on in HBASE-1684