Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
This came up in HCATALOG-328.
InitializeInput is the HCatalog class that queries the HiveMetaStore and stores the query result. It could be improved in the following ways:
- The class has entirely static methods, so a private arg-less constructor should be added to prevent people from accidentally creating instances.
- Instead of querying the HiveMetaStore each time info is requested, the results should be cached after the first query using a key of db+table+filter.
- setInput and getSerializedHcatKeyJobInfo require an existing InputJobInfo argument, however, the point of calling those methods is to populate a InputJobInfo with info from the metastore. While this reduces the number of arguments (instead of needing database name, table name, partition filter) it confuses the user because its not clear only db/table/filter should be set when passed as an argument.
- getSerializedHcatKeyJobInfo should be renamed getInputJobInfo and return an unserialized InputJobInfo. This avoids unnecessary serialization/deserialization in the front-end when its not necessary to read from the job configuration.