Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-2590

HBase bulk load wiki page improvements

    XMLWordPrintableJSON

Details

    • wiki

    Description

      Some suggestions on the page https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad which seems kind of out of date:

      1. It seems like it's required that the number of reduce tasks in the "Sort Data" phase be one more than the number of keys selected in the "Range Partitioning" step, or else you get an error like this:

      Caused by: java.lang.IllegalArgumentException: Can't read partitions file
      at org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:91)
      ... 15 more
      Caused by: java.io.IOException: Wrong number of partitions in keyset
      at org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:72)
      ... 15 more

      If so, it would be helpful if this was explicitly pointed out.

      2. It recommends that you should use the "loadtable" ruby script to put data into hbase, but if you run this on newer versions of HBase (e.g. 0.90.3) it errors:

      DISABLED!!!! Use completebulkload instead. See tail of http://hbase.apache.org/bulk-loads.html

      The instructions should probably be changed to use completebulkload instead of this script.

      Attachments

        Activity

          People

            xodarap Ben West
            xodarap Ben West
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: