THanks Jarek Jarcec Cecho for your review and comments.
I initially had the database separately, but the hcatalog team thought it made more sense to have them together (as they process tables with that format). But adding a database column should not be difficult if we need to make it more compliant.
It is true that schema inference is a great feature. I thought of adding it in a follow on JIRA with some additional constructs such that we still give the user storage type independence if they so desire. For example if they want all their tables to be whatever they choose (and supported by HCatalog) as the default and pre-create tables if specific output file type is desired. I will create a SUB task and get it in this task itself.
The main issue with not supporting hive-drop-delims is that string columns with embedded delimiter chars and using delimited text format will have the fidelity issues that the current users have. I considered that but was not sure if it was worth the extra processing for all output types. I wanted to sqoop code to be agnostic of the storage format (so not worry about querying the metadata on the storage info), and users still have the option of using the current hive import to deal with that case if it is so desired which is well understood.
Direct option does not deal with sqoop record type, so we have to come up with a HCat implementation for each connection manager based on its input/output row formats after parsing it. For example, in the case of Netezza direct mode, Sqoop ORM scheme is not involved so we don't even generate the jar files. I think the existing Hive import mechanism can be used where applicable (not sure if it works with all connection managers, but since the output is text format, the existing hive import support should help with that). As you know, HCatalog uses hive metastore, so such tables are also available to HCatalog users.
Regarding running as part of the normal test suite, I totally understand. I also did not want it to be a manual test. If you look at the test utils, I use a mini cluster like MR job to do the loading into HCatalog and reading off HCatalog. HCatalog does not have a HCatalogMiniCluster (for unit testing). When I first tried to run everything in local mode (which is supported), then the hive tests failed (because we depend on lack of some classes to distinguish between hive external cli or in process invocation). That is why I had to exclude some of the Hive classes from the dependencies to make it run all unit tests. Let me see if there is a way to accommodate both use cases (by introducing additional test parameters to force external Hive CLI usage that uses the mock Hive utils that we have in the unit test framework) and still get HCatalog run as part of the unit tests.