HIVE-7604: Add Metastore API to fetch one or more partition names Overview of API, request and response structures and some decisions behind them. API: ---- Thrift API: PartitionValuesResponse get_partition_values(1:PartitionValuesRequest request) throws(1:MetaException o1, 2:NoSuchObjectException o2); HiveMetaStoreClient will have APIs like the following to make it easy to use. listPartitionValuesByFilter(dbName, tblName, list pKeys) listPartitionValuesByFilter(dbName, tblName, pKey) .. other APIs with ascending, maxparts arguments etc... Thrift Request: --------------- struct PartitionValuesRequest { 1: required string dbName, 2: required string tblName, 3: required list partitionKeys; 4: optional bool applyDistinct = true; 5: optional string filter; 6: optional list partitionOrder; 7: optional bool ascending = true; 8: optional i64 maxParts = -1; } Thoughts: ~~~~~~~~~ 1. partitionKeys - Using FieldSchema for partitionKeys since it also takes care of data types (in case we would like to cast it differently). 2. applyDistinct - Most use cases would only need distinct values, but still left an optional control for applying distinct to results. 3. partitionOrder - Default order would be partitionKeys if not specified. Not sure how many use cases need this. Left it since its does not seem hard to implement. 4. ascending - For queries like 'select max(dt)’, one can set maxParts to 1 and order by DESC and get the latest date. 5. maxParts - All other APIs use i16 for maxParts and that does not work when number of partitions is > i16 which is mostly the case for many of our datasets. Thrift Response: --------------- struct PartitionValuesRow { 1: required list row; } struct PartitionValuesResponse { 1: required list partitionValues; } Thoughts: ~~~~~~~~~ PartitionValuesRow contains one row of partition values. If the user selects two partition keys (say dt, market), then the size of this each row will be 2 (Example row: [“20140101”, “us”]). The position of the partition keys will be the same as the input order. This is different from the structure used for get_partition_names() (which is a string of “key1=val1/key2=val2”). get_partition_names API has return value list which requires us to parse data (p1=v1/p2=v2) on client side. To avoid parsing on the client side using list for each row helps. All partition values will be converted to String irrespective of their datatype. That is simpler than using union with all datatypes. PartitionValuesResponse is basically a list of all partition rows. For example, for 2 partition keys, it will contain two rows, say {[“20140101”, “us”], [“20140101”, “in”]}.