Hive
  1. Hive
  2. HIVE-3565

use hbase tables for writing intermediate directories across map-reduce boundaries

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Activity

      Hide
      Namit Jain added a comment -

      Consider a query like:

      select B.y, count(1) from
      A join B on A.x=B.x
      group by B.y;

      This will require 2 MR jobs. The first MR job will perform the join, and the second MR job will perform the group by (note that the 2nd MR job would have a
      identity mapper). If the first MR job could write the output of the join to a HBase table (which is keyed by B.y), the 2nd MR can be a map-only job which can
      simply scan the HBase table. This idea can be extended to joins as well.

      Show
      Namit Jain added a comment - Consider a query like: select B.y, count(1) from A join B on A.x=B.x group by B.y; This will require 2 MR jobs. The first MR job will perform the join, and the second MR job will perform the group by (note that the 2nd MR job would have a identity mapper). If the first MR job could write the output of the join to a HBase table (which is keyed by B.y), the 2nd MR can be a map-only job which can simply scan the HBase table. This idea can be extended to joins as well.

        People

        • Assignee:
          Namit Jain
          Reporter:
          Namit Jain
        • Votes:
          0 Vote for this issue
          Watchers:
          2 Start watching this issue

          Dates

          • Created:
            Updated:

            Development