Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-974

New Backup Export/Import MR for 0.18/0.19

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: 0.18.0, 0.18.1, 0.19.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Erik has built a new backup tool that's compatible with newer versions of HBase.

      1. ExportMR.java
        5 kB
        Erik Holstad
      2. ExportMR.sh
        1 kB
        Erik Holstad
      3. ImportMR.java
        6 kB
        Erik Holstad
      4. ImportMR.sh
        1 kB
        Erik Holstad
      5. makeExportJar.sh
        0.2 kB
        Erik Holstad
      6. makeImportJar.sh
        0.2 kB
        Erik Holstad

        Activity

        Hide
        streamy Jonathan Gray added a comment -

        Issue contains tools to perform this on 0.18 and 0.19. No plans to commit any of this into branches.

        Closing issue as Won't Fix. 0.20 backup now being worked on in HBASE-1684

        Show
        streamy Jonathan Gray added a comment - Issue contains tools to perform this on 0.18 and 0.19. No plans to commit any of this into branches. Closing issue as Won't Fix. 0.20 backup now being worked on in HBASE-1684
        Hide
        streamy Jonathan Gray added a comment -

        TableReduce has had some performance issues in the past. I think it's pretty good now though.

        Yes, you'll definitely want to turn off autoflush. Honestly I don't think that option existed when these jobs were written

        I will close these issues once I open an 0.20 issue.

        Show
        streamy Jonathan Gray added a comment - TableReduce has had some performance issues in the past. I think it's pretty good now though. Yes, you'll definitely want to turn off autoflush. Honestly I don't think that option existed when these jobs were written I will close these issues once I open an 0.20 issue.
        Hide
        atppp atppp added a comment -

        very interesting idea of directly serializing RR. However, in importer reducer, as you said, you could create new HTable in configure(), but you don't even have to do that. You can just directly let output collect (key, batchUpdate) and TableReduce would take care of committing. Plus TableReduce sets autoflush off which significantly boosts importing performance.

        see original example in HBASE-897 and my recent changes. (should mark 897 duplicate or close it?)

        Show
        atppp atppp added a comment - very interesting idea of directly serializing RR. However, in importer reducer, as you said, you could create new HTable in configure(), but you don't even have to do that. You can just directly let output collect (key, batchUpdate) and TableReduce would take care of committing. Plus TableReduce sets autoflush off which significantly boosts importing performance. see original example in HBASE-897 and my recent changes. (should mark 897 duplicate or close it?)
        Hide
        erikholstad@gmail.com Erik Holstad added a comment -

        First I didn't really understand why I created I new HTable in every reducer, but today it
        struck me, that we had it setup in another way. We have kinds like a pool of tables that
        you check in and out, but it has dependencies so that is why I removed it. Of course it
        doesn't make any sense to have it the way it is now, it just slows things down a lot.

        Show
        erikholstad@gmail.com Erik Holstad added a comment - First I didn't really understand why I created I new HTable in every reducer, but today it struck me, that we had it setup in another way. We have kinds like a pool of tables that you check in and out, but it has dependencies so that is why I removed it. Of course it doesn't make any sense to have it the way it is now, it just slows things down a lot.
        Hide
        erikholstad@gmail.com Erik Holstad added a comment -

        Hey Stack!
        I did some changes to the files so that they are looking better according to the standards and are using the
        util.Writables instead.
        Haven't tested the functionality yet, so will post the new code when I have, so you can comment on it again
        Not really sure where it fits best, but would say examples for now.
        Don't want to spend too much time fixing the code though, since I think that we will have more efficient ways
        of doing the backup in a little, when we start messing with Cascading for example, but will see what Yair says
        and after that post the updated code.

        Show
        erikholstad@gmail.com Erik Holstad added a comment - Hey Stack! I did some changes to the files so that they are looking better according to the standards and are using the util.Writables instead. Haven't tested the functionality yet, so will post the new code when I have, so you can comment on it again Not really sure where it fits best, but would say examples for now. Don't want to spend too much time fixing the code though, since I think that we will have more efficient ways of doing the backup in a little, when we start messing with Cascading for example, but will see what Yair says and after that post the updated code.
        Hide
        stack stack added a comment -

        Erik:

        Would suggest that when you are successful with Yair, that you get him to +1 one this issue. On the ExportMR job, I wonder if its possible to set maps == 0 so you don't have to supply a map task? Should we commit these classes to hbase? Into a contrib or under examples? I like the way they serialize RR. Good idea. If we're going to commit, they need apache licenses and the style fixed up (don't ask jgray – he'll only tell you wrong thing.... smile). For the below, check Writables in hbase util. I think there are methods there to help you do the below:

        		ByteArrayInputStream bis = new ByteArrayInputStream( ((BytesWritable)val).get() );// baos.toByteArray());
        		DataInputStream dis = new DataInputStream(bis);
        		RowResult rowRes = new RowResult();
        		rowRes.readFields(dis);
        

        If you use HbaseMapWritable instead of MW, you could do without Text and toString'ing table name (I think). In 880, I believe RowResult and BatchUpdate have same ancestor. Would be sweet if they could be used interchangeably so you wouldn't need to do the conversion in rowResultToBatchUpdate. You think it makes sense creating the new HTable in the reduce each time its invoked and not in its configure step?

        Show
        stack stack added a comment - Erik: Would suggest that when you are successful with Yair, that you get him to +1 one this issue. On the ExportMR job, I wonder if its possible to set maps == 0 so you don't have to supply a map task? Should we commit these classes to hbase? Into a contrib or under examples? I like the way they serialize RR. Good idea. If we're going to commit, they need apache licenses and the style fixed up (don't ask jgray – he'll only tell you wrong thing.... smile). For the below, check Writables in hbase util. I think there are methods there to help you do the below: ByteArrayInputStream bis = new ByteArrayInputStream( ((BytesWritable)val).get() ); // baos.toByteArray()); DataInputStream dis = new DataInputStream(bis); RowResult rowRes = new RowResult(); rowRes.readFields(dis); If you use HbaseMapWritable instead of MW, you could do without Text and toString'ing table name (I think). In 880, I believe RowResult and BatchUpdate have same ancestor. Would be sweet if they could be used interchangeably so you wouldn't need to do the conversion in rowResultToBatchUpdate. You think it makes sense creating the new HTable in the reduce each time its invoked and not in its configure step?
        Hide
        erikholstad@gmail.com Erik Holstad added a comment -

        Removed the dependency to HBaseRef since it is not needed.

        Show
        erikholstad@gmail.com Erik Holstad added a comment - Removed the dependency to HBaseRef since it is not needed.
        Hide
        erikholstad@gmail.com Erik Holstad added a comment -

        MapReduce files for importing and exporting data from HBase tables.
        Extra scripts used to create the jars and execute the jobs are attached for reference.

        Show
        erikholstad@gmail.com Erik Holstad added a comment - MapReduce files for importing and exporting data from HBase tables. Extra scripts used to create the jars and execute the jobs are attached for reference.
        Hide
        yedingding@gmail.com sishen added a comment -

        I have patched Dan's previous work on this tool to work with 0.18/0.19.

        However, looking forward to see this effort.

        Show
        yedingding@gmail.com sishen added a comment - I have patched Dan's previous work on this tool to work with 0.18/0.19. However, looking forward to see this effort.

          People

          • Assignee:
            erikholstad@gmail.com Erik Holstad
            Reporter:
            streamy Jonathan Gray
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development