HBase
  1. HBase
  2. HBASE-1861

Multi-Family support for bulk upload tools (HFileOutputFormat / loadtable.rb)

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.0
    • Fix Version/s: 0.92.0
    • Component/s: mapreduce
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Add multi-family support to bulk upload tools from HBASE-48.

        Issue Links

          Activity

          Hide
          Ioannis Konstantinou added a comment -

          Is anyone working on this?

          Show
          Ioannis Konstantinou added a comment - Is anyone working on this?
          Hide
          stack added a comment -

          Not to my knowledge. Thinking on it, this case is a little tougher than the single family case.

          1. In single family case, we just write single files and read the file metadata to create region (We extract from the file its start and end rows and use these conjuring the region description). In the multiple family case, somehow you'll have to tie all files in a region together – perhaps in metadata or with a file suffix or prefix. I was thinking that you'd keep a running tab on the size of the file in each family and then as soon as any one file went over the region maximum file size limit, you'd rotate all files.
          2. The loadtables.rb script would need to change to read across all files in a region to find the least first row and the maximum last row by looking at all file metadatas.

          If you want to discuss this issue more, put up some questions and I'll have a stab at them. Thanks.

          Show
          stack added a comment - Not to my knowledge. Thinking on it, this case is a little tougher than the single family case. 1. In single family case, we just write single files and read the file metadata to create region (We extract from the file its start and end rows and use these conjuring the region description). In the multiple family case, somehow you'll have to tie all files in a region together – perhaps in metadata or with a file suffix or prefix. I was thinking that you'd keep a running tab on the size of the file in each family and then as soon as any one file went over the region maximum file size limit, you'd rotate all files. 2. The loadtables.rb script would need to change to read across all files in a region to find the least first row and the maximum last row by looking at all file metadatas. If you want to discuss this issue more, put up some questions and I'll have a stab at them. Thanks.
          Hide
          Ioannis Konstantinou added a comment -

          Hi again. One thing I noticed during bulk upload (of a single column family) is a bug in the following scenario (correct me if this is not the case):
          I have a mapper that reads input and emmits KeyValue objects to be fed in the KeyValueSortReducer. The mapper emmits a number of KeyValue objects for each row. For the same rowid, the KeyValue objects have different columnids.
          The problem is the following: when these KeyValue objects (that have the same rowid but different colids in the same column family) reach the reducer, the TreeSet used to sort KeyValues, keeps only the KeyValue that gets last (it replaces all entries with the last one that reaches the reducer), as the KeyValue.COMPARATOR compares only the rowid !!!!!
          Can I use a different Comparator??? KeyValue objects of the same rowid must be sorted before writing them in the Hfile, or this does not matter???

          Show
          Ioannis Konstantinou added a comment - Hi again. One thing I noticed during bulk upload (of a single column family) is a bug in the following scenario (correct me if this is not the case): I have a mapper that reads input and emmits KeyValue objects to be fed in the KeyValueSortReducer. The mapper emmits a number of KeyValue objects for each row. For the same rowid, the KeyValue objects have different columnids. The problem is the following: when these KeyValue objects (that have the same rowid but different colids in the same column family) reach the reducer, the TreeSet used to sort KeyValues, keeps only the KeyValue that gets last (it replaces all entries with the last one that reaches the reducer), as the KeyValue.COMPARATOR compares only the rowid !!!!! Can I use a different Comparator??? KeyValue objects of the same rowid must be sorted before writing them in the Hfile, or this does not matter???
          Hide
          Lars Francke added a comment -

          I have taken a stab at it. This is what I did:

          • Currently once it is decided that a HFile becomes too large it is closed and a new one is open. This doesn't work anymore because there may still be KeyValues for the current row in other column families coming. So now I just set a flag that a HFile rotation is needed. On every write this flag is tested and when it is true and the row key changes I close all currently open HFiles
            • This gets slightly more complicated due to the fact that we only close the HFiles but don't open new ones here because they may not be needed. So a check is still required on every write if we need to open a new HFile
          • As we later need to know which files belong together to a region I save them using the current task attempt id and a counter to guarantee their uniqueness

          The current tests all run with my changes which is a good sign.

          The second part is the loading of those files which seems to be more complicated and which could use some comments. HBASE-1923 recently made this more complicated and I'm not sure I fully understand. Basically these are the changes required:

          • To create a new region we now have to look for the start- and endkey in all column families
          • We have to load all the column families HFiles for a single region, those might be different between regions

          To make both steps easier I could write an additional metadata file during HFileOutputFormat which contains the start- and endkeys as well as all the column families that have HFiles for this region. This data is available during creation.

          So any input on how this would affect/be affected by the incremental stuff would be appreciated.

          Show
          Lars Francke added a comment - I have taken a stab at it. This is what I did: Currently once it is decided that a HFile becomes too large it is closed and a new one is open. This doesn't work anymore because there may still be KeyValues for the current row in other column families coming. So now I just set a flag that a HFile rotation is needed. On every write this flag is tested and when it is true and the row key changes I close all currently open HFiles This gets slightly more complicated due to the fact that we only close the HFiles but don't open new ones here because they may not be needed. So a check is still required on every write if we need to open a new HFile As we later need to know which files belong together to a region I save them using the current task attempt id and a counter to guarantee their uniqueness The current tests all run with my changes which is a good sign. The second part is the loading of those files which seems to be more complicated and which could use some comments. HBASE-1923 recently made this more complicated and I'm not sure I fully understand. Basically these are the changes required: To create a new region we now have to look for the start- and endkey in all column families We have to load all the column families HFiles for a single region, those might be different between regions To make both steps easier I could write an additional metadata file during HFileOutputFormat which contains the start- and endkeys as well as all the column families that have HFiles for this region. This data is available during creation. So any input on how this would affect/be affected by the incremental stuff would be appreciated.
          Hide
          Lars Francke added a comment -

          Okay after having talked with Lars George and looking over the incremental load stuff from Todd I've got even more questions.
          It seems as if - and we should really document this somewhere - there are now two distinct ways to bulk load stuff into HBase:

          loadtable.rb creates regions manually and just creates the metadata to be picked up by the metascanner. This seems like it is not very resource intensive (after the HFiles have been generated).

          And then there's the new completebulkload tool which shifts some of the load to HBase itself by (and please correct me if I understood this wrong) possibly splitting a lot of the existing regions and basically depending on HBase to put HFiles in appropriate places. This is a great solution for incremental loads as regions already exist. But is this a good solution performance/load wise for an empty table? My knowledge of HBase in this regard is still limited but I would have thought that the constant splitting would be pretty bad especially when starting with an empty table with no regions.

          I'd love your input on how to solve this: multi column families only for empty tables supported by loadtable.rb or only for the incremental bulk load tool or for both?
          This also includes the question if we should keep loadtable.rb if it is a better fit for "cold imports".

          Show
          Lars Francke added a comment - Okay after having talked with Lars George and looking over the incremental load stuff from Todd I've got even more questions. It seems as if - and we should really document this somewhere - there are now two distinct ways to bulk load stuff into HBase: loadtable.rb creates regions manually and just creates the metadata to be picked up by the metascanner. This seems like it is not very resource intensive (after the HFiles have been generated). And then there's the new completebulkload tool which shifts some of the load to HBase itself by (and please correct me if I understood this wrong) possibly splitting a lot of the existing regions and basically depending on HBase to put HFiles in appropriate places. This is a great solution for incremental loads as regions already exist. But is this a good solution performance/load wise for an empty table? My knowledge of HBase in this regard is still limited but I would have thought that the constant splitting would be pretty bad especially when starting with an empty table with no regions. I'd love your input on how to solve this: multi column families only for empty tables supported by loadtable.rb or only for the incremental bulk load tool or for both? This also includes the question if we should keep loadtable.rb if it is a better fit for "cold imports".
          Hide
          Vidhyashankar Venkataraman added a comment -

          Lars,
          Are you working on this issue: in particular, are you working on the completebulkload (incremental load) tool? I thought I will take a shot at it: Our group needs this tool to be up and working quite soon.. And Stack pointed me to this jira..

          Vidhya

          Show
          Vidhyashankar Venkataraman added a comment - Lars, Are you working on this issue: in particular, are you working on the completebulkload (incremental load) tool? I thought I will take a shot at it: Our group needs this tool to be up and working quite soon.. And Stack pointed me to this jira.. Vidhya
          Hide
          Lars Francke added a comment -

          Yes and yes but I wouldn't mind if you took over. Thank you! I'll attach a patch of what I have (it is unfinished and untested but perhaps it helps anyway) in a second.

          Show
          Lars Francke added a comment - Yes and yes but I wouldn't mind if you took over. Thank you! I'll attach a patch of what I have (it is unfinished and untested but perhaps it helps anyway) in a second.
          Hide
          Jonathan Gray added a comment -

          Punting to 0.92

          Show
          Jonathan Gray added a comment - Punting to 0.92
          Hide
          Nicolas Spiegelberg added a comment -

          IRC chat with Todd about this issue. I figured that areas I was confused about, others might be as well. Highlights:

          1. Multi-family support is almost done, just need someone to verify (yay, me)
          2. Remember that this code handles new table + import code
          3. HLog rolling (for HFileOutputFormat) only makes sense in the new table case
          3. splits are accounted for in LoadIncrementalHFiles,

          --------------------------------------------------
          [6:07pm] nspiegelberg: so, I'm trying to understand why multiple CF doesn't work already
          [6:08pm] nspiegelberg: Also, since we're pre-splitting, I'm trying to find all the code where bulk importing tries to split for you
          [6:08pm] tlipcon: i don't think it's that far off from working
          [6:08pm] tlipcon: just needs a little "accounting" to make sure that the hfiles all split at the same boundaries
          [6:08pm] tlipcon: I think it's just in LoadIncrementalHFiles
          [6:08pm] tlipcon: HFOF itself will split on max.region.size boundaries I think
          [6:09pm] nspiegelberg: from what I can tell, you're basically worried about when to roll HFiles and to make sure you don't roll them on the same Row
          [6:09pm] tlipcon: right
          [6:09pm] tlipcon: rolling HFiles is basically important when you underestimate the number of reducers you should be making
          [6:10pm] nspiegelberg: don't you do 1 reducer/region?
          [6:10pm] nspiegelberg: if there's no edits to a region, then that reducer is idle
          [6:11pm] tlipcon: you're only thinking about pre-created case
          [6:11pm] tlipcon: but HFOF also works for the new table case
          [6:11pm] tlipcon: and with reducer skew in that case, you'd prefer one reducer to maybe make two regions
          [6:12pm] nspiegelberg: wouldn't that be accomplished by doing a split in LoadIncrementalHFiles?
          [6:12pm] tlipcon: yea, but that split is very slow
          [6:12pm] tlipcon: it's a physical split
          [6:12pm] tlipcon: it's only there to take care of the case where you've got some splits in between running the MR and loading the hfiles
          [6:13pm] nspiegelberg: k. it sounds like my life is simplified by pre-split regions
          [6:13pm] nspiegelberg: just need to not mess up split case
          [6:25pm] nspiegelberg: hey, I am still a little fuzzy on how you're handling the case where a split happens between configureIncrementalLoad() and bulkLoadHFile()
          [6:26pm] tlipcon: nspiegelberg: the completebulkload (LoadIncrementalHFiles) deals with it
          [6:26pm] tlipcon: it's ugly, it physically splits the hfile on the new boundary
          [6:26pm] tlipcon: and adds the new ones to a queue
          [6:28pm] nspiegelberg: so... if splitting doesn't happen until LoadIncrementalHFiles, why do you need to worry that you've hit a new row when you roll HFiles?
          [6:29pm] nspiegelberg: it only makes sense to not roll HFiles until the next row when you want to use that as a split point
          [6:30pm] tlipcon: it's for new tables
          [6:30pm] tlipcon: for the actual incremental case, the rolling done by HFOF doesn't really buy you anything
          [6:30pm] tlipcon: except that it minimizes the amount of work
          [6:30pm] tlipcon: but for a new table, you might want 10 reducers, but maybe you have skew, so one of the reducers gets 5x as much data as the others
          [6:31pm] tlipcon: it should still make regions that fit your region size
          [6:31pm] nspiegelberg: well, in our case, we need to add a threshold to PutSortReducer so we don't try to put too many entries in an in-memory Map
          [6:31pm] nspiegelberg: so rolling the hfiles does make sense
          [6:35pm] tlipcon: recall that reducer only runs per row
          [6:35pm] tlipcon: it doesn't shove multiple rowsin the map
          [6:35pm] tlipcon: just multiple columns
          [6:36pm] nspiegelberg: the Iterable<Put> is not a stream?
          [6:36pm] • nspiegelberg is a MR n00b
          [6:36pm] tlipcon: it is streamed, but all those puts are for the same row

          Show
          Nicolas Spiegelberg added a comment - IRC chat with Todd about this issue. I figured that areas I was confused about, others might be as well. Highlights: 1. Multi-family support is almost done, just need someone to verify (yay, me) 2. Remember that this code handles new table + import code 3. HLog rolling (for HFileOutputFormat) only makes sense in the new table case 3. splits are accounted for in LoadIncrementalHFiles, -------------------------------------------------- [6:07pm] nspiegelberg: so, I'm trying to understand why multiple CF doesn't work already [6:08pm] nspiegelberg: Also, since we're pre-splitting, I'm trying to find all the code where bulk importing tries to split for you [6:08pm] tlipcon: i don't think it's that far off from working [6:08pm] tlipcon: just needs a little "accounting" to make sure that the hfiles all split at the same boundaries [6:08pm] tlipcon: I think it's just in LoadIncrementalHFiles [6:08pm] tlipcon: HFOF itself will split on max.region.size boundaries I think [6:09pm] nspiegelberg: from what I can tell, you're basically worried about when to roll HFiles and to make sure you don't roll them on the same Row [6:09pm] tlipcon: right [6:09pm] tlipcon: rolling HFiles is basically important when you underestimate the number of reducers you should be making [6:10pm] nspiegelberg: don't you do 1 reducer/region? [6:10pm] nspiegelberg: if there's no edits to a region, then that reducer is idle [6:11pm] tlipcon: you're only thinking about pre-created case [6:11pm] tlipcon: but HFOF also works for the new table case [6:11pm] tlipcon: and with reducer skew in that case, you'd prefer one reducer to maybe make two regions [6:12pm] nspiegelberg: wouldn't that be accomplished by doing a split in LoadIncrementalHFiles? [6:12pm] tlipcon: yea, but that split is very slow [6:12pm] tlipcon: it's a physical split [6:12pm] tlipcon: it's only there to take care of the case where you've got some splits in between running the MR and loading the hfiles [6:13pm] nspiegelberg: k. it sounds like my life is simplified by pre-split regions [6:13pm] nspiegelberg: just need to not mess up split case [6:25pm] nspiegelberg: hey, I am still a little fuzzy on how you're handling the case where a split happens between configureIncrementalLoad() and bulkLoadHFile() [6:26pm] tlipcon: nspiegelberg: the completebulkload (LoadIncrementalHFiles) deals with it [6:26pm] tlipcon: it's ugly, it physically splits the hfile on the new boundary [6:26pm] tlipcon: and adds the new ones to a queue [6:28pm] nspiegelberg: so... if splitting doesn't happen until LoadIncrementalHFiles, why do you need to worry that you've hit a new row when you roll HFiles? [6:29pm] nspiegelberg: it only makes sense to not roll HFiles until the next row when you want to use that as a split point [6:30pm] tlipcon: it's for new tables [6:30pm] tlipcon: for the actual incremental case, the rolling done by HFOF doesn't really buy you anything [6:30pm] tlipcon: except that it minimizes the amount of work [6:30pm] tlipcon: but for a new table, you might want 10 reducers, but maybe you have skew, so one of the reducers gets 5x as much data as the others [6:31pm] tlipcon: it should still make regions that fit your region size [6:31pm] nspiegelberg: well, in our case, we need to add a threshold to PutSortReducer so we don't try to put too many entries in an in-memory Map [6:31pm] nspiegelberg: so rolling the hfiles does make sense [6:35pm] tlipcon: recall that reducer only runs per row [6:35pm] tlipcon: it doesn't shove multiple rowsin the map [6:35pm] tlipcon: just multiple columns [6:36pm] nspiegelberg: the Iterable<Put> is not a stream? [6:36pm] • nspiegelberg is a MR n00b [6:36pm] tlipcon: it is streamed, but all those puts are for the same row
          Hide
          HBase Review Board added a comment -

          Message from: "Nicolas" <nspiegelberg@facebook.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/1272/
          -----------------------------------------------------------

          Review request for hbase.

          Summary
          -------

          support writing to multiple column families for HFileOutputFormat. also, added a max threshold for PutSortReducer because we had some pathological row cases.

          This addresses bug HBASE-1861.
          http://issues.apache.org/jira/browse/HBASE-1861

          Diffs


          src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java 8ccdf4d
          src/main/java/org/apache/hadoop/hbase/mapreduce/PutSortReducer.java 5fb3e83
          src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java c5d56cc

          Diff: http://review.cloudera.org/r/1272/diff

          Testing
          -------

          mvn test -Dtest=ThestHFileOutputFormat
          internal MR testing

          Thanks,

          Nicolas

          Show
          HBase Review Board added a comment - Message from: "Nicolas" <nspiegelberg@facebook.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1272/ ----------------------------------------------------------- Review request for hbase. Summary ------- support writing to multiple column families for HFileOutputFormat. also, added a max threshold for PutSortReducer because we had some pathological row cases. This addresses bug HBASE-1861 . http://issues.apache.org/jira/browse/HBASE-1861 Diffs src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java 8ccdf4d src/main/java/org/apache/hadoop/hbase/mapreduce/PutSortReducer.java 5fb3e83 src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java c5d56cc Diff: http://review.cloudera.org/r/1272/diff Testing ------- mvn test -Dtest=ThestHFileOutputFormat internal MR testing Thanks, Nicolas
          Hide
          HBase Review Board added a comment -

          Message from: stack@duboce.net

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/1272/#review2044
          -----------------------------------------------------------

          Ship it!

          +1 Excellent.

          src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java
          <http://review.cloudera.org/r/1272/#comment6448>

          Should this behavior be documented in method javadoc?

          • stack
          Show
          HBase Review Board added a comment - Message from: stack@duboce.net ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1272/#review2044 ----------------------------------------------------------- Ship it! +1 Excellent. src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java < http://review.cloudera.org/r/1272/#comment6448 > Should this behavior be documented in method javadoc? stack
          Hide
          HBase Review Board added a comment -

          Message from: "Nicolas" <nspiegelberg@facebook.com>

          On 2010-12-07 17:13:55, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java, line 93

          > <http://review.cloudera.org/r/1272/diff/1/?file=17977#file17977line93>

          >

          > Should this behavior be documented in method javadoc?

          will do

          • Nicolas

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/1272/#review2044
          -----------------------------------------------------------

          Show
          HBase Review Board added a comment - Message from: "Nicolas" <nspiegelberg@facebook.com> On 2010-12-07 17:13:55, stack wrote: > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java, line 93 > < http://review.cloudera.org/r/1272/diff/1/?file=17977#file17977line93 > > > Should this behavior be documented in method javadoc? will do Nicolas ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1272/#review2044 -----------------------------------------------------------
          Hide
          stack added a comment -

          @Nicolas Mother Hen is here. You want to resolve this issue now you've committed the patch? Also, I did not see an addition made to CHANGES.txt on commit. Did you mean to add one (On commit and resolve, we'll add an entry to the CHANGES.txt file that is under the hbase checkout dir). Pardon me if I'm telling you stuff you know already. Good on you.

          Show
          stack added a comment - @Nicolas Mother Hen is here. You want to resolve this issue now you've committed the patch? Also, I did not see an addition made to CHANGES.txt on commit. Did you mean to add one (On commit and resolve, we'll add an entry to the CHANGES.txt file that is under the hbase checkout dir). Pardon me if I'm telling you stuff you know already. Good on you.
          Hide
          Nicolas Spiegelberg added a comment -

          @stack : That stuff was already done for me on previous patches I committed and we're lazier about CHANGES.txt internally, so I'll get that corrected and remember to double-check next time

          Show
          Nicolas Spiegelberg added a comment - @stack : That stuff was already done for me on previous patches I committed and we're lazier about CHANGES.txt internally, so I'll get that corrected and remember to double-check next time
          Hide
          stack added a comment -

          @Nicolas nm on CHANGES.txt change. I see it now. Pardon Mother Hen for fussing.

          Show
          stack added a comment - @Nicolas nm on CHANGES.txt change. I see it now. Pardon Mother Hen for fussing.
          Hide
          Cosmin Lehene added a comment -

          How hard would it be to adapt this for 0.90 branch?

          Show
          Cosmin Lehene added a comment - How hard would it be to adapt this for 0.90 branch?
          Hide
          Nicolas Spiegelberg added a comment -

          @Cosmin. This should apply cleanly to 0.90. I was just taking the path of least resistance until someone needed it in 0.90

          Show
          Nicolas Spiegelberg added a comment - @Cosmin. This should apply cleanly to 0.90. I was just taking the path of least resistance until someone needed it in 0.90
          Hide
          Nichole Treadway added a comment -

          I've been testing this in 0.90.2. I found that some reduce tasks fail with a File Not Found exception if there are no keys in the input data which would fall into the region assigned to that reduce task.

          From what I can determine, it seems that an output directory is created in the write() method and expected to exist in the writeMetaData() method...if there are no keys to be written for that reduce task, the write method is never called, but writeMetaData is expecting the output directory to exist...thus the FnF exception.

          Simply checking if the file exists should fix the issue. I've patched this on my end...should I post my fix here?

          Show
          Nichole Treadway added a comment - I've been testing this in 0.90.2. I found that some reduce tasks fail with a File Not Found exception if there are no keys in the input data which would fall into the region assigned to that reduce task. From what I can determine, it seems that an output directory is created in the write() method and expected to exist in the writeMetaData() method...if there are no keys to be written for that reduce task, the write method is never called, but writeMetaData is expecting the output directory to exist...thus the FnF exception. Simply checking if the file exists should fix the issue. I've patched this on my end...should I post my fix here?
          Hide
          Todd Lipcon added a comment -

          Hi Nichole. I think it's probably best to open a new JIRA for the bugfix since this one has been closed for a while.

          Show
          Todd Lipcon added a comment - Hi Nichole. I think it's probably best to open a new JIRA for the bugfix since this one has been closed for a while.
          Hide
          stack added a comment -

          Nichole added HBASE-3782

          Show
          stack added a comment - Nichole added HBASE-3782
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #1909 (See https://builds.apache.org/hudson/job/HBase-TRUNK/1909/)
          HBASE-3864 Rename of hfile.min.blocksize.size in HBASE-2899 reverted in HBASE-1861

          Show
          Hudson added a comment - Integrated in HBase-TRUNK #1909 (See https://builds.apache.org/hudson/job/HBase-TRUNK/1909/ ) HBASE-3864 Rename of hfile.min.blocksize.size in HBASE-2899 reverted in HBASE-1861

            People

            • Assignee:
              Nicolas Spiegelberg
              Reporter:
              Jonathan Gray
            • Votes:
              4 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development