Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Writing a MapReduce job using HCatalog involves the following step (from [Input and Output Interfaces|http://incubator.apache.org/hcatalog/docs/r0.4.0/inputoutput.html):
HCatInputFormat.setInput(job, InputJobInfo.create(dbName,
inputTableName, null));
Notice how we expose InputJobInfo as part of the public API. The purpose of this class is to store the HiveMetaStore response data, and by serializing to the jobconf, passes that info to backend workers.
Really this class is an implementation detail that should be hidden from users. In HCATALOG-453 where the size of InputJobInfo was addressed we had to be aware of potentially user-affecting changes to this class. In HCATALOG-341 we clarified the usage of InputJobInfo because there was some confusion.
Ideally we could go through a deprecation cycle and change our interface to hide this implementation detail from users, which clarifies things for users, and give us more implementation flexibility.
HCatInputFormat.setInput(job, dbName, inputTableName, null);
Thoughts?