When trying to load our test data on a fresh CDH cluster, it fails when creating an external table in Hive on top of HBase with 500 columns. The reason is that the column mapping in Hive creates a string that is ~7700 characters long, but the column definition in Hive only allows 4000 characters.
There is a Hive bug to track this issue: https://issues.apache.org/jira/browse/HIVE-9815, but the real question is, why is the schema definition for our local Hive metastore database different from the one we ship, and how can we correct this bug.
Now, interestingly, in the postgresql script we have to create the metastore, the column definition says 4000 characters too. Which leaves the question, why isn't the test failing?
Another nit: I just checked hive_impala_dump_cdh5-585.txt and here the column definition for SERDE_PARAMS.PARAM_VALUE is without the length limitation.