HBase
  1. HBase
  2. HBASE-3240

Improve documentation of importtsv and bulk loads

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.92.0
    • Fix Version/s: 0.92.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Right now our bulk load features are a little confusing. We have loadtable.rb for new tables and completebulkload for existing tables. The docs only talk about the incremental case, and there are basically no docs for the ruby script. We should conslidate these things and make the documentation a little more clear on the full story.

      1. hbase-3240.0.patch
        10 kB
        Aaron T. Myers
      2. hbase-3240.1.patch
        11 kB
        Aaron T. Myers

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          There's also some question as to whether loadtable.rb will work at all on the new master, since we no longer have a MetaScanner chore.. if that's the case, this will have to move to be an 0.90 blocker.

          Show
          Todd Lipcon added a comment - There's also some question as to whether loadtable.rb will work at all on the new master, since we no longer have a MetaScanner chore.. if that's the case, this will have to move to be an 0.90 blocker.
          Hide
          stack added a comment -

          loadtable.rb does not work in 0.90.

          Show
          stack added a comment - loadtable.rb does not work in 0.90.
          Hide
          stack added a comment -

          The loadtable.rb script does this on invocation:

          puts 'DISABLED!!!! Use completebulkload instead.  See tail of http://hbase.apache.org/bulk-loads.html'
          

          To close this issue, docs. need a bit of an update.

          Show
          stack added a comment - The loadtable.rb script does this on invocation: puts 'DISABLED!!!! Use completebulkload instead. See tail of http: //hbase.apache.org/bulk-loads.html' To close this issue, docs. need a bit of an update.
          Hide
          Doug Meil added a comment -

          We'll do the docs on this.

          Show
          Doug Meil added a comment - We'll do the docs on this.
          Hide
          Aaron T. Myers added a comment -

          Hey Doug, any update on these docs? If you don't have time to get to these, do you mind if I reassign the issue?

          Show
          Aaron T. Myers added a comment - Hey Doug, any update on these docs? If you don't have time to get to these, do you mind if I reassign the issue?
          Hide
          Doug Meil added a comment -

          Aaron, sorry about this. The guy on our team that was going to do this was swamped, so I re-assigned this to you.

          Go for it!

          Show
          Doug Meil added a comment - Aaron, sorry about this. The guy on our team that was going to do this was swamped, so I re-assigned this to you. Go for it!
          Hide
          Aaron T. Myers added a comment -

          Here's a patch which updates/reorganizes the documentation regarding HBase's bulk load features. Nothing in the previous documentation was in fact incorrect, but this patch makes a few things more explicit.

          Show
          Aaron T. Myers added a comment - Here's a patch which updates/reorganizes the documentation regarding HBase's bulk load features. Nothing in the previous documentation was in fact incorrect, but this patch makes a few things more explicit.
          Hide
          Todd Lipcon added a comment -

          + this data into HBase. This tool by default takes care of both steps of
          + bulk loading as described above. This tool is available by running

          importtsv doesn't complete the bulk load, does it?

          • -Dimporttsv.timestamp=currentTimeAsLong - use the specified timestamp for the import
            why'd you remove this? it no longer exists?
          Show
          Todd Lipcon added a comment - + this data into HBase. This tool by default takes care of both steps of + bulk loading as described above. This tool is available by running importtsv doesn't complete the bulk load, does it? -Dimporttsv.timestamp=currentTimeAsLong - use the specified timestamp for the import why'd you remove this? it no longer exists?
          Hide
          Aaron T. Myers added a comment -

          importtsv doesn't complete the bulk load, does it?

          My bad. I misunderstood the implementation of importtsv as always using bulk loads, but optionally not actually doing complete. In fact, it either prepares HFOFs for bulk load, or just uses the HBase put API directly. I'll update the patch to make this more clear.

          why'd you remove this? it no longer exists?

          It was duplicated two lines down. Note this page: http://hbase.apache.org/bulk-loads.html

          Show
          Aaron T. Myers added a comment - importtsv doesn't complete the bulk load, does it? My bad. I misunderstood the implementation of importtsv as always using bulk loads, but optionally not actually doing complete. In fact, it either prepares HFOFs for bulk load, or just uses the HBase put API directly. I'll update the patch to make this more clear. why'd you remove this? it no longer exists? It was duplicated two lines down. Note this page: http://hbase.apache.org/bulk-loads.html
          Hide
          Aaron T. Myers added a comment -

          Updated patch to address Todd's comment.

          Show
          Aaron T. Myers added a comment - Updated patch to address Todd's comment.
          Hide
          Todd Lipcon added a comment -

          one more thing: you've edited the usage information in the docs, but not in ImportTsv.java itself. The usage info printed on the command line should match what's shown in the docs.

          Show
          Todd Lipcon added a comment - one more thing: you've edited the usage information in the docs, but not in ImportTsv.java itself. The usage info printed on the command line should match what's shown in the docs.
          Hide
          Todd Lipcon added a comment -

          nm, was looking at the wrong source dir. I see you have done that. Will commit.

          Show
          Todd Lipcon added a comment - nm, was looking at the wrong source dir. I see you have done that. Will commit.
          Hide
          Todd Lipcon added a comment -

          Committed to trunk, thanks atm.

          Show
          Todd Lipcon added a comment - Committed to trunk, thanks atm.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2006 (See https://builds.apache.org/job/HBase-TRUNK/2006/)
          HBASE-3240 Improve documentation of importtsv and bulk loads.

          todd :
          Files :

          • /hbase/trunk/CHANGES.txt
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java
          • /hbase/trunk/src/site/xdoc/bulk-loads.xml
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2006 (See https://builds.apache.org/job/HBase-TRUNK/2006/ ) HBASE-3240 Improve documentation of importtsv and bulk loads. todd : Files : /hbase/trunk/CHANGES.txt /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java /hbase/trunk/src/site/xdoc/bulk-loads.xml
          Hide
          Lars George added a comment -

          I think the "-c" is wrong? The GenericOptionsParser only takes "-conf":

              Option oconf = OptionBuilder.withArgName("configuration file")
              .hasArg()
              .withDescription("specify an application configuration file")
              .create("conf");
          

          Same in trunk https://github.com/apache/hadoop/blob/trunk/src/core/org/apache/hadoop/util/GenericOptionsParser.java#L206

          Or is this handled somewhere else?

          Show
          Lars George added a comment - I think the "-c" is wrong? The GenericOptionsParser only takes "-conf": Option oconf = OptionBuilder.withArgName( "configuration file" ) .hasArg() .withDescription( "specify an application configuration file" ) .create( "conf" ); Same in trunk https://github.com/apache/hadoop/blob/trunk/src/core/org/apache/hadoop/util/GenericOptionsParser.java#L206 Or is this handled somewhere else?

            People

            • Assignee:
              Aaron T. Myers
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development