Description
I'm testing the join on hdfs and JDBC storage, and found that the worker resource is not collected properly after a query failure.
Here is the log.
tpch> select l_orderkey, sum(l_extendedprice*(1-l_discount)) as revenue, o_orderdate, o_shippriority from pgsql_db.customer as c join pgsql_db.orders as o on c.c_mktsegment = 'BUILDING' and c.c_custkey = o.o_custkey join lineitem as l on l.l_orderkey = o.o_orderkey where o_orderdate < '1995-03-15'::date and l_shipdate > '1995-03-15'::date group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate; ... 2015-09-23 01:22:36,776 INFO org.apache.tajo.worker.TaskImpl: Initializing: ta_1442938691159_0002_000002_000000_02 2015-09-23 01:22:36,777 INFO org.apache.tajo.worker.TaskImpl: ================================== 2015-09-23 01:22:36,777 INFO org.apache.tajo.worker.TaskImpl: * Stage ta_1442938691159_0002_000002_000000_02 is initialized 2015-09-23 01:22:36,777 INFO org.apache.tajo.worker.TaskImpl: * InterQuery: true, Use HASH_SHUFFLE shuffle, Fragments (num: 1), Fetches (total:0) : 2015-09-23 01:22:36,777 INFO org.apache.tajo.worker.TaskImpl: * Local task dir: file:/disk2/tajo-tmp/q_1442938691159_0002/output/2/0_2 2015-09-23 01:22:36,777 INFO org.apache.tajo.worker.TaskImpl: ================================== 2015-09-23 01:22:36,778 INFO org.apache.tajo.storage.jdbc.JdbcScanner: Generated SQL: SELECT o_orderkey,o_custkey,o_orderdate,o_shippriority FROM orders WHERE o.o_orderdate < DATE '1995-03-15' 2015-09-23 01:22:36,783 ERROR org.apache.tajo.worker.TaskImpl: internal error: ERROR: missing FROM-clause entry for table "o" Position: 74 org.apache.tajo.exception.TajoInternalError: internal error: ERROR: missing FROM-clause entry for table "o" Position: 74 at org.apache.tajo.storage.jdbc.JdbcScanner.executeQueryAndGetIter(JdbcScanner.java:263) at org.apache.tajo.storage.jdbc.JdbcScanner.next(JdbcScanner.java:116) at org.apache.tajo.engine.planner.physical.FullScanIterator.hasNext(FullScanIterator.java:39) at org.apache.tajo.engine.planner.physical.SeqScanExec.next(SeqScanExec.java:292) at org.apache.tajo.engine.planner.physical.HashShuffleFileWriteExec.next(HashShuffleFileWriteExec.java:102) at org.apache.tajo.worker.TaskImpl.run(TaskImpl.java:405) at org.apache.tajo.worker.TaskContainer.run(TaskContainer.java:65) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-09-23 01:22:36,783 INFO org.apache.tajo.worker.TaskImpl: ta_1442938691159_0002_000002_000000_02 completed. Worker's task counter - total:3, succeeded: 0, killed: 0, failed: 3 2015-09-23 01:22:36,784 ERROR org.apache.tajo.querymaster.TaskAttempt: ta_1442938691159_0002_000002_000000_02 FROM jihoon-desktop >> internal error: ERROR: missing FROM-clause entry for table "o" Position: 74 2015-09-23 01:22:36,784 INFO org.apache.tajo.querymaster.Task: ==================================================================================== 2015-09-23 01:22:36,784 INFO org.apache.tajo.querymaster.Task: >>> Task Failed: ta_1442938691159_0002_000002_000000_02, retry:false, attempts:3 <<< 2015-09-23 01:22:36,784 INFO org.apache.tajo.querymaster.Task: ==================================================================================== 2015-09-23 01:22:36,784 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: TaskScheduler schedulingThread stopped 2015-09-23 01:22:36,784 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Task Scheduler stopped 2015-09-23 01:22:36,785 INFO org.apache.tajo.querymaster.Stage: Stage completed - eb_1442938691159_0002_000002 (total=1, success=0, killed=0) 2015-09-23 01:22:36,785 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Task Scheduler stopped 2015-09-23 01:22:36,785 INFO org.apache.tajo.querymaster.Query: Processing q_1442938691159_0002 of type STAGE_COMPLETED 2015-09-23 01:22:36,785 INFO org.apache.tajo.querymaster.Query: Complete Stage[eb_1442938691159_0002_000002], State: FAILED, 1/2. 2015-09-23 01:22:36,785 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Assigned host : jihoon-desktop, Unknown Volume : -1, Concurrency : 1 2015-09-23 01:22:36,785 INFO org.apache.tajo.storage.HashShuffleAppenderManager: Close HashShuffleAppender:eb_1442938691159_0002_000002, not a hash shuffle 2015-09-23 01:22:36,786 ERROR org.apache.tajo.querymaster.Stage: Can't handle this event at current state, eventType:SQ_SHUFFLE_REPORT, oldState:FAILED, nextState:FAILED org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: SQ_SHUFFLE_REPORT at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tajo.querymaster.Stage.handle(Stage.java:797) at org.apache.tajo.querymaster.QueryMasterTask$StageEventDispatcher.handle(QueryMasterTask.java:210) at org.apache.tajo.querymaster.QueryMasterTask$StageEventDispatcher.handle(QueryMasterTask.java:204) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2015-09-23 01:22:36,786 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Task Scheduler stopped 2015-09-23 01:22:36,786 INFO org.apache.tajo.worker.TaskManager: Stopped execution block:eb_1442938691159_0002_000002 2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.Query: Processing q_1442938691159_0002 of type STAGE_COMPLETED 2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.Query: Complete Stage[eb_1442938691159_0002_000002], State: ERROR, 2/2. 2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.Query: Processing q_1442938691159_0002 of type QUERY_COMPLETED 2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.Query: q_1442938691159_0002 Query Transitioned from QUERY_RUNNING to QUERY_ERROR 2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.QueryMasterTask: Query completion notified from q_1442938691159_0002 final state: QUERY_ERROR 2015-09-23 01:22:36,787 INFO org.apache.tajo.worker.TaskImpl: Task basedir is created (file:/disk2/tajo-tmp/q_1442938691159_0002/output/1) 2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Assigned host : jihoon-desktop, Unknown Volume : -1, Concurrency : 2 2015-09-23 01:22:36,787 INFO org.apache.tajo.worker.TaskImpl: Initializing: ta_1442938691159_0002_000001_000002_00 2015-09-23 01:22:36,788 INFO org.apache.tajo.worker.TaskImpl: ================================== 2015-09-23 01:22:36,788 INFO org.apache.tajo.querymaster.QueryMasterTask: Stopping QueryMasterTask:q_1442938691159_0002 2015-09-23 01:22:36,788 INFO org.apache.tajo.worker.TaskImpl: * Stage ta_1442938691159_0002_000001_000002_00 is initialized 2015-09-23 01:22:36,789 INFO org.apache.tajo.worker.TaskImpl: * InterQuery: true, Use HASH_SHUFFLE shuffle, Fragments (num: 1), Fetches (total:0) : 2015-09-23 01:22:36,789 INFO org.apache.tajo.worker.TaskImpl: * Local task dir: file:/disk2/tajo-tmp/q_1442938691159_0002/output/1/2_0 2015-09-23 01:22:36,789 INFO org.apache.tajo.worker.TaskImpl: ================================== 2015-09-23 01:22:36,789 INFO org.apache.tajo.querymaster.QueryMasterTask: Cleanup resources of all workers. Query: q_1442938691159_0002, workers: 1 2015-09-23 01:22:36,789 INFO org.apache.tajo.querymaster.QueryMasterTask: Stopped QueryMasterTask:q_1442938691159_0002 2015-09-23 01:22:36,790 INFO org.apache.tajo.worker.TaskImpl: Task basedir is created (file:/disk2/tajo-tmp/q_1442938691159_0002/output/1) 2015-09-23 01:22:36,790 INFO org.apache.tajo.worker.TaskImpl: Initializing: ta_1442938691159_0002_000001_000003_00 2015-09-23 01:22:36,792 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Assigned host : jihoon-desktop, Unknown Volume : -1, Concurrency : 3 2015-09-23 01:22:36,792 INFO org.apache.tajo.worker.TaskImpl: ================================== 2015-09-23 01:22:36,792 INFO org.apache.tajo.worker.TaskImpl: * Stage ta_1442938691159_0002_000001_000003_00 is initialized 2015-09-23 01:22:36,792 INFO org.apache.tajo.worker.TaskImpl: * InterQuery: true, Use HASH_SHUFFLE shuffle, Fragments (num: 1), Fetches (total:0) : 2015-09-23 01:22:36,792 INFO org.apache.tajo.worker.TaskImpl: * Local task dir: file:/disk2/tajo-tmp/q_1442938691159_0002/output/1/3_0 2015-09-23 01:22:36,792 INFO org.apache.tajo.worker.TaskImpl: ================================== 2015-09-23 01:22:36,795 INFO org.apache.tajo.worker.TaskManager: QueryMaster Address:jihoon-desktop:28093 2015-09-23 01:22:36,797 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Assigned host : jihoon-desktop, Unknown Volume : -1, Concurrency : 4 2015-09-23 01:22:36,798 INFO org.apache.tajo.worker.TaskImpl: ta_1442938691159_0002_000001_000002_00 completed. Worker's task counter - total:3, succeeded: 2, killed: 0, failed: 1 2015-09-23 01:22:36,798 WARN org.apache.tajo.rpc.NettyClientBase: java.nio.channels.ClosedChannelException Try to reconnect :jihoon-desktop/127.0.1.1:28093 2015-09-23 01:22:36,801 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Assigned host : jihoon-desktop, Unknown Volume : -1, Concurrency : 5 2015-09-23 01:22:36,992 INFO org.apache.tajo.util.history.HistoryWriter: Saving query summary: hdfs://localhost:7020/tmp/tajo-jihoon/staging/history/20150923/query-detail/q_1442938691159_0002/query.hist 2015-09-23 01:22:37,026 INFO org.apache.tajo.util.history.HistoryWriter: Saving query unit: hdfs://localhost:7020/tmp/tajo-jihoon/staging/history/20150923/query-detail/q_1442938691159_0002/eb_1442938691159_0002_000001.hist 2015-09-23 01:22:37,042 INFO org.apache.tajo.util.history.HistoryWriter: Saving query unit: hdfs://localhost:7020/tmp/tajo-jihoon/staging/history/20150923/query-detail/q_1442938691159_0002/eb_1442938691159_0002_000002.hist 2015-09-23 01:22:37,559 INFO org.apache.tajo.util.history.HistoryWriter: Closing task history file: file:/tmp/tajo-jihoon/history/19700101/tasks/jihoon-desktop_28091/jihoon-desktop_28091_09_0.hist 2015-09-23 01:22:37,803 WARN org.apache.tajo.querymaster.QueryMasterManagerService: No QueryMasterTask: ta_1442938691159_0002_000001_000002_00 2015-09-23 01:22:46,797 FATAL org.apache.tajo.worker.TaskManager: java.util.concurrent.TimeoutException at org.apache.tajo.rpc.CallFuture.get(CallFuture.java:79) at org.apache.tajo.worker.TaskManager.createExecutionBlock(TaskManager.java:126) at org.apache.tajo.worker.TaskManager.handle(TaskManager.java:178) at org.apache.tajo.worker.TaskManager.handle(TaskManager.java:54) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745)