Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7604

Add Metastore API to fetch one or more partition names

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Metastore
    • None

    Description

      We need a new API in Metastore to address the following use cases. Both use cases arise from having tables with hundreds of thousands or in some cases millions of partitions.

      1. It should be quick and easy to obtain distinct values of a partition. Eg: Obtain all dates for which partitions are available. This can be used by tools/frameworks programmatically to understand gaps in partitions before reprocessing them. Currently one has to run Hive queries (JDBC or CLI) to obtain this information which is unfriendly and heavy weight. And for tables which have large number of partitions, it takes a long time to run the queries and it also requires large heap space.

      2. Typically users would like to know the list of partitions available and would run queries that would only involve partition keys (select distinct partkey1 from table) Or to obtain the latest date partition from a dimension table to join against another fact table (select * from fact_table join select max(dt) from dimension_table). Those queries (metadata only queries) can be pushed to metastore and need not be run even locally in Hive. If the queries can be converted into database based queries, the clients can be light weight and need not fetch all partition names. The results can be obtained much faster with less resources.

      Attachments

        1. Design_HIVE_7604.txt
          2 kB
          Thiruvel Thirumoolan
        2. Design_HIVE_7604.1.txt
          3 kB
          Thiruvel Thirumoolan

        Activity

          People

            thiruvel Thiruvel Thirumoolan
            thiruvel Thiruvel Thirumoolan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: