Prasad and I had some offline discussions. First of all, some high-level conclusions:
0. Lifetime of the objects: The lifetime of all hive operators are a part of the lifetime of their configuration. The lifetime of all Tasks (ExecDriver, MapRedTask, MoveTask, etc) are a part of the lifetime of the db connection. The lifetime of a CommandProcessor is also a part of the lifetime of the db connection (well in some CommandProcessor we want to switch and close the db connection, as the actual operation that we want to do). The CliDriver and HiveServerHandler should correspond to a session.
1. ThreadLocal is good for keeping thread-local variables for ease-of-use (don't need to pass the object around, or give it in the constructor), ease-of-debugging (because there is no chance that another thread will change the content of the thread-local storage), and security reasons (the same as debugging).
2. Passing objects in constructor (and add a getter) is good for easy understanding of the program flow, as well as allowing the same thread to have 2 different objects (e.g. db connection). There is not such a strong need since all db connection calls are blocking. We can always switch the thread-local db connection to the correct one before we start every call.
Second, we have to make a choice between the 3:
A. Make Hive consists of the db connection and the conf. Make it a thread local storage. Tasks (including ExecDriver), CommandProcessors(including Driver), and also CliDriver/HiveServerHandler will access this thread-local db connection and conf. Make SessionState consists of other seesion specific things like stdout, history, etc (but NOT Hive). SessionState is also thread-specific and CliDriver will access SessionState for these information, as well as access Hive for db connection etc. So both Hive and SessionState are independent and thread-local.
B. Make Hive consists of the db connection and the conf. Pass Hive as a constructor/initialize parameter to all Tasks (including ExecDriver) and CommandProcessors(including Driver). Make SessionState consists of session specific things like stdout, history, and ALSO Hive. CliDriver/HiveServerHandler will use the thread-specific SessionState for all things. So only SessionState is thread-local (while Hive is part of it).
C. The same as B, except letting SessionState be a parameter to the constructor of CliDriver/HiveServerHandler. So there is no thread-local storage at all. So nothing is thread-local.
The benefit shared by all these 3 is that Tasks (including ExecDriver) and CommandProcessors(including Driver) don't need to deal with Session - they just need a Hive.
NOTE: All hive operators (at mapper and reducer) should continue to use the configuration that is passed, just to conform to hadoop model.