Hive
  1. Hive
  2. HIVE-1541

More general dataflow execution backend

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      With the recent open source release of Mesos (http://github.com/mesos/mesos), experimentation at the query execution layer has become more feasible. Inspired by more general-purpose dataflow systems like Volcano, Dryad, and Dremel, it would be interesting to explore a more general-purpose dataflow execution system for Hive queries. One potential backend is the Hyracks project from UCI: http://code.google.com/p/hyracks.

        Activity

        Hide
        Jeff Hammerbacher added a comment -

        In particular, it would be nice to avoid the startup overhead of Hadoop MapReduce with this backend.

        Show
        Jeff Hammerbacher added a comment - In particular, it would be nice to avoid the startup overhead of Hadoop MapReduce with this backend.
        Hide
        Venkatesh Seetharam added a comment -

        Oozie should be a good candidate as well.

        Show
        Venkatesh Seetharam added a comment - Oozie should be a good candidate as well.
        Hide
        Jeff Hammerbacher added a comment -

        Hey Venkatesh,

        HIVE-1107 is aimed at getting Hive and Pig to express their sequence of MapReduce jobs as an Oozie workflow. For this JIRA, I meant an entirely different initialization routine and set of physical operators, similar to those used by an MPP relational database or Dremel, Whether Oozie is used to describe the workflow tying together these new physical operators is less of a concern to me.

        Thanks,
        Jeff

        Show
        Jeff Hammerbacher added a comment - Hey Venkatesh, HIVE-1107 is aimed at getting Hive and Pig to express their sequence of MapReduce jobs as an Oozie workflow. For this JIRA, I meant an entirely different initialization routine and set of physical operators, similar to those used by an MPP relational database or Dremel, Whether Oozie is used to describe the workflow tying together these new physical operators is less of a concern to me. Thanks, Jeff

          People

          • Assignee:
            Unassigned
            Reporter:
            Jeff Hammerbacher
          • Votes:
            0 Vote for this issue
            Watchers:
            20 Start watching this issue

            Dates

            • Created:
              Updated:

              Development