Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Repro steps:
1. Happy path, parallel execution disabled
0: jdbc:hive2://localhost:10000> set hive.exec.parallel=false; No rows affected (0.023 seconds) 0: jdbc:hive2://localhost:10000> select count (*) from t1; INFO : Compiling command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d): select count (*) from t1 INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:c0, type:bigint, comment:null)], properties:null) INFO : Completed compiling command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d); Time taken: 0.309 seconds INFO : Executing command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d): select count (*) from t1 WARN : INFO : Query ID = karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Number of reduce tasks determined at compile time: 1 INFO : In order to change the average load for a reducer (in bytes): INFO : set hive.exec.reducers.bytes.per.reducer=<number> INFO : In order to limit the maximum number of reducers: INFO : set hive.exec.reducers.max=<number> INFO : In order to set a constant number of reducers: INFO : set mapreduce.job.reduces=<number> DEBUG : Configuring job job_local495362389_0008 with file:/tmp/hadoop/mapred/staging/karencoppage495362389/.staging/job_local495362389_0008 as the submit dir DEBUG : adding the following namenodes' delegation tokens:[file:///] DEBUG : Creating splits at file:/tmp/hadoop/mapred/staging/karencoppage495362389/.staging/job_local495362389_0008 INFO : number of splits:0 INFO : Submitting tokens for job: job_local495362389_0008 INFO : Executing with tokens: [] INFO : The url to track the job: http://localhost:8080/ INFO : Job running in-process (local Hadoop) INFO : 2019-10-28 15:26:22,537 Stage-1 map = 0%, reduce = 100% INFO : Ended Job = job_local495362389_0008 INFO : MapReduce Jobs Launched: INFO : Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS INFO : Total MapReduce CPU Time Spent: 0 msec INFO : Completed executing command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d); Time taken: 6.497 seconds INFO : OK DEBUG : Shutting down query select count (*) from t1 +-----+ | c0 | +-----+ | 0 | +-----+ 1 row selected (11.874 seconds)
2. Faulty path, parallel execution enabled
0: jdbc:hive2://localhost:10000> set hive.server2.logging.operation.level=EXECUTION; No rows affected (0.236 seconds) 0: jdbc:hive2://localhost:10000> set hive.exec.parallel=true; No rows affected (0.01 seconds) 0: jdbc:hive2://localhost:10000> select count (*) from t1; INFO : Compiling command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77): select count (*) from t1 INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:c0, type:bigint, comment:null)], properties:null) INFO : Completed compiling command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77); Time taken: 4.707 seconds INFO : Executing command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77): select count (*) from t1 WARN : INFO : Query ID = karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in parallel INFO : MapReduce Jobs Launched: INFO : Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS INFO : Total MapReduce CPU Time Spent: 0 msec INFO : Completed executing command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77); Time taken: 44.577 seconds INFO : OK DEBUG : Shutting down query select count (*) from t1 +-----+ | c0 | +-----+ | 0 | +-----+ 1 row selected (54.665 seconds)
The issue is that Log4J stores the session ID and query ID in some atomic thread metadata (org.apache.logging.log4j.ThreadContext.getImmutableContext()). If the queryId is missing from this metadata, then the RoutingAppender (which is defined programmatically in LogDivertAppender) will route the log to a NullAppender, which logs nothing. If the queryId is present, then the RoutingAppender routes the event to the "query-appender", which will log the line in the operation log/Beeline. This is not happening in a multi-threaded context since new threads created for parallel query execution do not have the queryId/sessionId metadata.
The solution is to add the queryId/sessionId metadata to any new threads created for MR work.
Attachments
Attachments
Issue Links
- links to