Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24263

Create an HMS endpoint to list partition locations

Log workAgile BoardRank to TopRank to BottomAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      In our company, we have a use-case to get quickly a list of partition locations. Currently it is done via listPartitions, which is a very heavy operation in terms of memory and performance.

      This JIRA proposes an API: Map<String, String> listPartitionLocations(String db, String table, short max) that returns a map of partition names to locations.

      For example, we have an integration from output of a Hive pipeline to Spark jobs that consume directly from HDFS. The Spark job scheduler needs to know the partition paths that are available for consumption (the partition name is not sufficient as it's input is HDFS path), and so we have to do heavy listPartitions() for this.

      Another use-case is for a HDFS data removal tool that does a nightly crawl to see if there are associated hive partitions mapped to a given partition path. The nightly crawling job could be much less resource-intensive if we had a listPartitionLocations().

      As there is already an internal method in the ObjectStore for this done for dropPartitions, it is only a matter of exposing this API to HiveMetaStoreClient.

        Attachments

        1. HIVE-24263.patch
          1.13 MB
          Szehon Ho

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

              Dates

              • Created:
                Updated:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 40m
              40m

                Issue deployment