Hive
  1. Hive
  2. HIVE-3752

Add a non-sql API in hive to access data.

    Details

    • Type: Improvement Improvement
    • Status: Reopened
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      We would like to add an input/output format for accessing Hive data in Hadoop directly without having to use e.g. a transform. Using a transform
      means having to do a whole map-reduce step with its own disk accesses and its imposed structure. It also means needing to have Hive be the base infrastructure for the entire system being developed which is not the right fit as we only need a small part of it (access to the data).

      So we propose adding an API level InputFormat and OutputFormat to Hive that will make it trivially easy to select a table with partition spec and read from / write to it. We chose this design to make it compatible with Hadoop so that existing systems that work with Hadoop's IO API will just work out of the box.

      We need this system for the Giraph graph processing system (http://giraph.apache.org/) as running graph jobs which read/write from Hive is a common use case.

      [~namitjain] Avery Ching Kevin Wilfong Alessandro Presta

      Input-side (HiveApiInputFormat) review: https://reviews.facebook.net/D7401

        Issue Links

          Activity

            People

            • Assignee:
              Nitay Joffe
              Reporter:
              Nitay Joffe
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:

                Development