Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.5.1
-
None
Description
Ambari installations of hive currently do not set any datanucleus related properties. There is such a thing as a datanucleus l2 cache, that is pretty bad for hive in a distributed environment if it is set. (If there is a lone embedded hive instance, with no other codepaths to the db, then it's fine, but that never happens in a distributed environment.)
By default, if no setting is present, datanucleus defaults the l2 cache to being on, so hive ups the ante by defaulting to turning it off by default if no other setting is configured.
Now, in a war of "defaults", the hive default should win, but this is an area where we have had recurring support issues from clients that turn it on expecting improved performance. Thus, I'd like ambari installed hive-site.xml to explicitly have this config parameter turned off, with a comment asking users to not switch it on as it impacts hive negatively.
The parameter in question is "datanucleus.cache.level2.type" , and it's value should be "none". (Note that I've seen some older configs that seem to do things like turning datanucleus.cache.level2 = false and stuff like that, that is bogus config and does nothing and should not be assumed to be a catch-all enabler.)
As a comment, I'd like the following comment "Disables datanucleus l2 cache. This must be set to 'none' for hive to work properly" or something to that effect.