Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1875

Resource leak after a query failure

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11.0
    • Component/s: Scheduler
    • Labels:
      None

      Description

      I'm testing the join on hdfs and JDBC storage, and found that the worker resource is not collected properly after a query failure.

      Here is the log.

      tpch> select
          l_orderkey,
          sum(l_extendedprice*(1-l_discount)) as revenue,
          o_orderdate,
          o_shippriority
      from
          pgsql_db.customer as c
          join pgsql_db.orders as o
              on c.c_mktsegment = 'BUILDING' and c.c_custkey = o.o_custkey
          join lineitem as l
              on l.l_orderkey = o.o_orderkey
      where
          o_orderdate < '1995-03-15'::date
          and l_shipdate > '1995-03-15'::date
      group by
          l_orderkey, o_orderdate, o_shippriority
      order by
          revenue desc, o_orderdate;
      ...
      2015-09-23 01:22:36,776 INFO org.apache.tajo.worker.TaskImpl: Initializing: ta_1442938691159_0002_000002_000000_02
      2015-09-23 01:22:36,777 INFO org.apache.tajo.worker.TaskImpl: ==================================
      2015-09-23 01:22:36,777 INFO org.apache.tajo.worker.TaskImpl: * Stage ta_1442938691159_0002_000002_000000_02 is initialized
      2015-09-23 01:22:36,777 INFO org.apache.tajo.worker.TaskImpl: * InterQuery: true, Use HASH_SHUFFLE shuffle, Fragments (num: 1), Fetches (total:0) :
      2015-09-23 01:22:36,777 INFO org.apache.tajo.worker.TaskImpl: * Local task dir: file:/disk2/tajo-tmp/q_1442938691159_0002/output/2/0_2
      2015-09-23 01:22:36,777 INFO org.apache.tajo.worker.TaskImpl: ==================================
      2015-09-23 01:22:36,778 INFO org.apache.tajo.storage.jdbc.JdbcScanner: Generated SQL: SELECT o_orderkey,o_custkey,o_orderdate,o_shippriority FROM orders WHERE o.o_orderdate < DATE '1995-03-15'
      2015-09-23 01:22:36,783 ERROR org.apache.tajo.worker.TaskImpl: internal error: ERROR: missing FROM-clause entry for table "o"
        Position: 74
      org.apache.tajo.exception.TajoInternalError: internal error: ERROR: missing FROM-clause entry for table "o"
        Position: 74
              at org.apache.tajo.storage.jdbc.JdbcScanner.executeQueryAndGetIter(JdbcScanner.java:263)
              at org.apache.tajo.storage.jdbc.JdbcScanner.next(JdbcScanner.java:116)
              at org.apache.tajo.engine.planner.physical.FullScanIterator.hasNext(FullScanIterator.java:39)
              at org.apache.tajo.engine.planner.physical.SeqScanExec.next(SeqScanExec.java:292)
              at org.apache.tajo.engine.planner.physical.HashShuffleFileWriteExec.next(HashShuffleFileWriteExec.java:102)
              at org.apache.tajo.worker.TaskImpl.run(TaskImpl.java:405)
              at org.apache.tajo.worker.TaskContainer.run(TaskContainer.java:65)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
              at java.util.concurrent.FutureTask.run(FutureTask.java:262)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:745)
      2015-09-23 01:22:36,783 INFO org.apache.tajo.worker.TaskImpl: ta_1442938691159_0002_000002_000000_02 completed. Worker's task counter - total:3, succeeded: 0, killed: 0, failed: 3
      2015-09-23 01:22:36,784 ERROR org.apache.tajo.querymaster.TaskAttempt: ta_1442938691159_0002_000002_000000_02 FROM jihoon-desktop >> internal error: ERROR: missing FROM-clause entry for table "o"
        Position: 74
      2015-09-23 01:22:36,784 INFO org.apache.tajo.querymaster.Task: ====================================================================================
      2015-09-23 01:22:36,784 INFO org.apache.tajo.querymaster.Task: >>> Task Failed: ta_1442938691159_0002_000002_000000_02, retry:false, attempts:3 <<<
      2015-09-23 01:22:36,784 INFO org.apache.tajo.querymaster.Task: ====================================================================================
      2015-09-23 01:22:36,784 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: TaskScheduler schedulingThread stopped
      2015-09-23 01:22:36,784 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Task Scheduler stopped
      2015-09-23 01:22:36,785 INFO org.apache.tajo.querymaster.Stage: Stage completed - eb_1442938691159_0002_000002 (total=1, success=0, killed=0)
      2015-09-23 01:22:36,785 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Task Scheduler stopped
      2015-09-23 01:22:36,785 INFO org.apache.tajo.querymaster.Query: Processing q_1442938691159_0002 of type STAGE_COMPLETED
      2015-09-23 01:22:36,785 INFO org.apache.tajo.querymaster.Query: Complete Stage[eb_1442938691159_0002_000002], State: FAILED, 1/2.
      2015-09-23 01:22:36,785 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Assigned host : jihoon-desktop, Unknown Volume : -1, Concurrency : 1
      2015-09-23 01:22:36,785 INFO org.apache.tajo.storage.HashShuffleAppenderManager: Close HashShuffleAppender:eb_1442938691159_0002_000002, not a hash shuffle
      2015-09-23 01:22:36,786 ERROR org.apache.tajo.querymaster.Stage: Can't handle this event at current state, eventType:SQ_SHUFFLE_REPORT, oldState:FAILED, nextState:FAILED
      org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: SQ_SHUFFLE_REPORT at FAILED
              at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
              at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
              at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
              at org.apache.tajo.querymaster.Stage.handle(Stage.java:797)
              at org.apache.tajo.querymaster.QueryMasterTask$StageEventDispatcher.handle(QueryMasterTask.java:210)
              at org.apache.tajo.querymaster.QueryMasterTask$StageEventDispatcher.handle(QueryMasterTask.java:204)
              at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
              at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
              at java.lang.Thread.run(Thread.java:745)
      2015-09-23 01:22:36,786 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Task Scheduler stopped
      2015-09-23 01:22:36,786 INFO org.apache.tajo.worker.TaskManager: Stopped execution block:eb_1442938691159_0002_000002
      2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.Query: Processing q_1442938691159_0002 of type STAGE_COMPLETED
      2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.Query: Complete Stage[eb_1442938691159_0002_000002], State: ERROR, 2/2.
      2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.Query: Processing q_1442938691159_0002 of type QUERY_COMPLETED
      2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.Query: q_1442938691159_0002 Query Transitioned from QUERY_RUNNING to QUERY_ERROR
      2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.QueryMasterTask: Query completion notified from q_1442938691159_0002 final state: QUERY_ERROR
      2015-09-23 01:22:36,787 INFO org.apache.tajo.worker.TaskImpl: Task basedir is created (file:/disk2/tajo-tmp/q_1442938691159_0002/output/1)
      2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Assigned host : jihoon-desktop, Unknown Volume : -1, Concurrency : 2
      2015-09-23 01:22:36,787 INFO org.apache.tajo.worker.TaskImpl: Initializing: ta_1442938691159_0002_000001_000002_00
      2015-09-23 01:22:36,788 INFO org.apache.tajo.worker.TaskImpl: ==================================
      2015-09-23 01:22:36,788 INFO org.apache.tajo.querymaster.QueryMasterTask: Stopping QueryMasterTask:q_1442938691159_0002
      2015-09-23 01:22:36,788 INFO org.apache.tajo.worker.TaskImpl: * Stage ta_1442938691159_0002_000001_000002_00 is initialized
      2015-09-23 01:22:36,789 INFO org.apache.tajo.worker.TaskImpl: * InterQuery: true, Use HASH_SHUFFLE shuffle, Fragments (num: 1), Fetches (total:0) :
      2015-09-23 01:22:36,789 INFO org.apache.tajo.worker.TaskImpl: * Local task dir: file:/disk2/tajo-tmp/q_1442938691159_0002/output/1/2_0
      2015-09-23 01:22:36,789 INFO org.apache.tajo.worker.TaskImpl: ==================================
      2015-09-23 01:22:36,789 INFO org.apache.tajo.querymaster.QueryMasterTask: Cleanup resources of all workers. Query: q_1442938691159_0002, workers: 1
      2015-09-23 01:22:36,789 INFO org.apache.tajo.querymaster.QueryMasterTask: Stopped QueryMasterTask:q_1442938691159_0002
      2015-09-23 01:22:36,790 INFO org.apache.tajo.worker.TaskImpl: Task basedir is created (file:/disk2/tajo-tmp/q_1442938691159_0002/output/1)
      2015-09-23 01:22:36,790 INFO org.apache.tajo.worker.TaskImpl: Initializing: ta_1442938691159_0002_000001_000003_00
      2015-09-23 01:22:36,792 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Assigned host : jihoon-desktop, Unknown Volume : -1, Concurrency : 3
      2015-09-23 01:22:36,792 INFO org.apache.tajo.worker.TaskImpl: ==================================
      2015-09-23 01:22:36,792 INFO org.apache.tajo.worker.TaskImpl: * Stage ta_1442938691159_0002_000001_000003_00 is initialized
      2015-09-23 01:22:36,792 INFO org.apache.tajo.worker.TaskImpl: * InterQuery: true, Use HASH_SHUFFLE shuffle, Fragments (num: 1), Fetches (total:0) :
      2015-09-23 01:22:36,792 INFO org.apache.tajo.worker.TaskImpl: * Local task dir: file:/disk2/tajo-tmp/q_1442938691159_0002/output/1/3_0
      2015-09-23 01:22:36,792 INFO org.apache.tajo.worker.TaskImpl: ==================================
      2015-09-23 01:22:36,795 INFO org.apache.tajo.worker.TaskManager: QueryMaster Address:jihoon-desktop:28093
      2015-09-23 01:22:36,797 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Assigned host : jihoon-desktop, Unknown Volume : -1, Concurrency : 4
      2015-09-23 01:22:36,798 INFO org.apache.tajo.worker.TaskImpl: ta_1442938691159_0002_000001_000002_00 completed. Worker's task counter - total:3, succeeded: 2, killed: 0, failed: 1
      2015-09-23 01:22:36,798 WARN org.apache.tajo.rpc.NettyClientBase: java.nio.channels.ClosedChannelException Try to reconnect :jihoon-desktop/127.0.1.1:28093
      2015-09-23 01:22:36,801 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Assigned host : jihoon-desktop, Unknown Volume : -1, Concurrency : 5
      2015-09-23 01:22:36,992 INFO org.apache.tajo.util.history.HistoryWriter: Saving query summary: hdfs://localhost:7020/tmp/tajo-jihoon/staging/history/20150923/query-detail/q_1442938691159_0002/query.hist
      2015-09-23 01:22:37,026 INFO org.apache.tajo.util.history.HistoryWriter: Saving query unit: hdfs://localhost:7020/tmp/tajo-jihoon/staging/history/20150923/query-detail/q_1442938691159_0002/eb_1442938691159_0002_000001.hist
      2015-09-23 01:22:37,042 INFO org.apache.tajo.util.history.HistoryWriter: Saving query unit: hdfs://localhost:7020/tmp/tajo-jihoon/staging/history/20150923/query-detail/q_1442938691159_0002/eb_1442938691159_0002_000002.hist
      2015-09-23 01:22:37,559 INFO org.apache.tajo.util.history.HistoryWriter: Closing task history file: file:/tmp/tajo-jihoon/history/19700101/tasks/jihoon-desktop_28091/jihoon-desktop_28091_09_0.hist
      2015-09-23 01:22:37,803 WARN org.apache.tajo.querymaster.QueryMasterManagerService: No QueryMasterTask: ta_1442938691159_0002_000001_000002_00
      2015-09-23 01:22:46,797 FATAL org.apache.tajo.worker.TaskManager:
      java.util.concurrent.TimeoutException
              at org.apache.tajo.rpc.CallFuture.get(CallFuture.java:79)
              at org.apache.tajo.worker.TaskManager.createExecutionBlock(TaskManager.java:126)
              at org.apache.tajo.worker.TaskManager.handle(TaskManager.java:178)
              at org.apache.tajo.worker.TaskManager.handle(TaskManager.java:54)
              at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
              at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
              at java.lang.Thread.run(Thread.java:745)
      

        Attachments

          Activity

            People

            • Assignee:
              jihoonson Jihoon Son
              Reporter:
              jihoonson Jihoon Son
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: