Hive
  1. Hive
  2. HIVE-1295

facilitate HBase bulk loads from Hive

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6.0
    • Fix Version/s: 0.6.0
    • Component/s: HBase Handler
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      HBase supports a bulk load procedure:

      http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#bulk

      We would like to add support to Hive so that users can bulk load HBase from Hive without having to write any map/reduce code.

      Ideally, this could be done with a single INSERT statement targeting the HBase storage handler (with an option set to request bulk load instead of row-level inserts).

      However, that will take a lot of work, so this JIRA is a first step to allow the bulk load files to be prepared inside of Hive via a sequence of SQL statements and then pushed into HBase via the loadtable.rb script.

      Note that until HBASE-1861 is implemented, the bulk load target table can only have a single column family.

        Issue Links

          Activity

          Hide
          John Sichi added a comment -

          I'm still working on testing out the procedure beyond toy cases; here's the doc:

          http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad

          Show
          John Sichi added a comment - I'm still working on testing out the procedure beyond toy cases; here's the doc: http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad
          Hide
          John Sichi added a comment -

          Notes on the patch:

          • After the recent Apache JIRA downtime, the ASF copyright license assignment radio button is no longer here in the UI, so I couldn't click that to grant to Apache, but I hereby grant it etc etc (and I'm a Facebook employee anyway, so I think my contribution is covered under the corporate license assignment).
          • The unit test requires MiniMR mode (instead of local mode) because it needs multiple reducers in order for the custom partitioner to kick in. Accordingly, I named the file with a .m suffix (instead of .q) to keep it segregated from the other unit tests. To run it, use -Dtestcase=TestHBaseMinimrCliDriver.
          • I had to make one change to the classpath definition in build-common.xml to avoid a webapp ordering issue from HBase and Hadoop being loaded in the same Jetty instance.
          • I made one change to ql/build.xml's classpath (adding jsp-2.1 jars). This was not actually needed for my hbase-handler test to run, but it fixes an existing breakage with running ql tests with -Dclustermode=miniMR, needed for HIVE-117. (The job tracker wasn't starting because it wasn't able to load some JSP API classes.)
          Show
          John Sichi added a comment - Notes on the patch: After the recent Apache JIRA downtime, the ASF copyright license assignment radio button is no longer here in the UI, so I couldn't click that to grant to Apache, but I hereby grant it etc etc (and I'm a Facebook employee anyway, so I think my contribution is covered under the corporate license assignment). The unit test requires MiniMR mode (instead of local mode) because it needs multiple reducers in order for the custom partitioner to kick in. Accordingly, I named the file with a .m suffix (instead of .q) to keep it segregated from the other unit tests. To run it, use -Dtestcase=TestHBaseMinimrCliDriver. I had to make one change to the classpath definition in build-common.xml to avoid a webapp ordering issue from HBase and Hadoop being loaded in the same Jetty instance. I made one change to ql/build.xml's classpath (adding jsp-2.1 jars). This was not actually needed for my hbase-handler test to run, but it fixes an existing breakage with running ql tests with -Dclustermode=miniMR, needed for HIVE-117 . (The job tracker wasn't starting because it wasn't able to load some JSP API classes.)
          Hide
          John Sichi added a comment -

          The copyright grant issue is INFRA-2605.

          Show
          John Sichi added a comment - The copyright grant issue is INFRA-2605 .
          Hide
          Namit Jain added a comment -

          I will take a look

          Show
          Namit Jain added a comment - I will take a look
          Hide
          Namit Jain added a comment -

          Committed. Thanks John

          Show
          Namit Jain added a comment - Committed. Thanks John
          Hide
          John Sichi added a comment -

          While testing on a different data set, I found some bugs, so I'll submit a followup patch for that.

          Show
          John Sichi added a comment - While testing on a different data set, I found some bugs, so I'll submit a followup patch for that.
          Hide
          John Sichi added a comment -

          Followup logged as HIVE-1321.

          Show
          John Sichi added a comment - Followup logged as HIVE-1321 .

            People

            • Assignee:
              John Sichi
              Reporter:
              John Sichi
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development