Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
With hive.exec.parallel=true, Driver.lauchTask() calls Task.initialize() from 1 thread on several Tasks. It then starts new threads to run those tasks.
Taks.initiazlie() gets an instance of Hive and holds on to it. Hive.java internally uses ThreadLocal to hand out instances, but since Task.initialize() is called by a single thread from the Driver multiple tasks share an instance of Hive.
Each Hive instances has a single instance of MetaStoreClient; the later is not thread safe.
With hive.exec.parallel=true, different threads actually execute the tasks, different threads end up sharing the same MetaStoreClient.
If you make 2 concurrent calls, for example Hive.getTable(String), the Thrift responses may return to the wrong caller.
Thus the first caller gets "out of sequence response", drops this message and reconnects. If the timing is right, it will consume the other's response, but the the other caller will block for hive.metastore.client.socket.timeout since its response message has now been lost.
This is just one concrete example.
One possible fix is to make Task.db use ThreadLocal.
This could be related to HIVE-6893
Attachments
Issue Links
- is related to
-
HIVE-1033 change default value of hive.exec.parallel to true
- Patch Available
- relates to
-
HIVE-6893 out of sequence error in HiveMetastore server
- Resolved
-
HIVE-10410 Apparent race condition in HiveServer2 causing intermittent query failures
- Resolved
-
HIVE-10677 hive.exec.parallel=true has problem when it is used for analyze table column stats
- Resolved