Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9452

Use HBase to store Hive metadata

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • hbase-metastore-branch
    • None
    • Metastore
    • None

    Description

      qThis is an umbrella JIRA for a project to explore using HBase to store the Hive data catalog (ie the metastore). This project has several goals:

      1. The current metastore implementation is slow when tables have thousands or more partitions. With Tez and Spark engines we are pushing Hive to a point where queries only take a few seconds to run. But planning the query can take as long as running it. Much of this time is spent in metadata operations.
      2. Due to scale limitations we have never allowed tasks to communicate directly with the metastore. However, with the development of LLAP this requirement will have to be relaxed. If we can relax this there are other use cases that could benefit from this.
      3. Eating our own dogfood. Rather than using external systems to store our metadata there are benefits to using other components in the Hadoop system.

      The proposal is to create a new branch and work on the prototype there.

      Attachments

        1. HBaseMetastoreApproach.pdf
          945 kB
          Alan Gates

        Issue Links

          1.
          Initial patch [hbase-metastore branch] Sub-task Closed Alan Gates
          2.
          Add support for getDatabases and alterDatabase calls [hbase-metastore branch] Sub-task Closed Alan Gates
          3.
          Support all get tables [hbase-metastore branch] Sub-task Closed Alan Gates
          4.
          Need a tool to export metadata from RDBMS based metastore into HBase Sub-task Closed Alan Gates
          5.
          Fill out remaining partition functions in HBaseStore Sub-task Closed Alan Gates
          6.
          Implement privileges call in HBaseStore Sub-task Closed Alan Gates
          7.
          Introduce a stats cache for aggregate stats in HBase metastore [hbase-metastore branch] Sub-task Closed Vaibhav Gumashta
          8.
          Need to add indices and privileges to HBaseImport and HBaseSchemaTool [hbase-metastore branch] Sub-task Open Alan Gates
          9.
          Documentation for HBase metastore Sub-task Open Alan Gates
          10.
          Remove M* classes from RawStore interface Sub-task Closed Alan Gates
          11.
          Move serialization of objects in HBase to protocol buffers Sub-task Closed Alan Gates
          12.
          Refactor HBaseReadWrite to allow different implementations underneath Sub-task Closed Alan Gates
          13.
          Need a way to record time spent in various metastore functions Sub-task Open Alan Gates
          14.
          Perform stats aggregation in HBase co-processor [hbase-metastore branch] Sub-task Open Unassigned
          15.
          Investigate ways to improve NDV calculations during stats aggregation [hbase-metastore branch] Sub-task Resolved Vaibhav Gumashta
          16.
          Implement functions methods in HBaseStore [hbase-metastore branch] Sub-task Closed Alan Gates
          17.
          Add connection manager for Tephra Sub-task Closed Alan Gates
          18.
          Generate Hbase execution plan for partition filter conditions in HbaseStore api calls - initial changes Sub-task Closed Thejas Nair
          19.
          Optimize handling of partition condition expression that result in invalid filter string Sub-task Closed Unassigned
          20.
          get metatool to work with hbase metastore Sub-task Open Unassigned
          21.
          Fix test failure in TestAggregateStatsCache Sub-task Resolved Vaibhav Gumashta
          22.
          Support filter on non-first partition key and non-string partition key Sub-task Closed Daniel Dai
          23.
          Add tests for partition level statistics + refactor stats tests of TestHBaseStore [hbase-metastore branch] Sub-task Resolved Vaibhav Gumashta
          24.
          Unit test against HBase Metastore Sub-task Closed Daniel Dai
          25.
          Describe a non-partitioned table fail Sub-task Closed Alan Gates
          26.
          Get partial stats instead of complete stats in some queries Sub-task Closed Vaibhav Gumashta
          27.
          Fix Unit test failures when HBase Metastore is used Sub-task Open Unassigned
          28.
          Fix alter related Unit tests for HBase metastore Sub-task Open Unassigned
          29.
          Fix stats related unit tests for HBase metastore Sub-task Open Unassigned
          30.
          Fix partitions related unit tests for HBase metastore Sub-task Open Unassigned
          31.
          Invalidate aggregate column stats on alter partition Sub-task Closed Alan Gates
          32.
          implement file footer / splits cache in HBase metastore Sub-task Open Sergey Shelukhin
          33.
          merge master into branch Sub-task Closed Sergey Shelukhin
          34.
          merge master into branch Sub-task Closed Sergey Shelukhin
          35.
          Fix TestMiniTezCliDriver test failures when HBase Metastore is used Sub-task Closed Daniel Dai
          36.
          import tool is too brittle (fails on existing role) Sub-task Open Unassigned
          37.
          import tool should print help by default Sub-task Closed Sergey Shelukhin
          38.
          import tool fails on non-secure cluster Sub-task Open Unassigned
          39.
          NPE in stats conversion with HBase metastore Sub-task Closed Sergey Shelukhin
          40.
          Fix UT regressions on hbase-metastore branch Sub-task Closed Daniel Dai
          41.
          Exclude hbase-metastore for hadoop-1 Sub-task Closed Daniel Dai
          42.
          Merge hbase-metastore branch to trunk Sub-task Closed Daniel Dai
          43.
          HBaseImport should import basic stats and column stats Sub-task Open Daniel Dai
          44.
          Exclude hbase-metastore in itests for hadoop-1 Sub-task Closed Daniel Dai
          45.
          HBase Port conflict for MiniHBaseCluster Sub-task Closed Daniel Dai
          46.
          Renable stats_filemetadata.q test case Sub-task Open Unassigned

          Activity

            People

              gates Alan Gates
              gates Alan Gates
              Votes:
              7 Vote for this issue
              Watchers:
              52 Start watching this issue

              Dates

                Created:
                Updated: