Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1560

HashShuffle report should be ignored when a succeed tasks are not included

    Details

      Description

      Currently, hash shuffle report always send to stage. If a worker ran all task too fast, other worker will be received shouldDie message, and it does not executed any task. but report will be sent.
      Additionally, a case of range shuffle is not need hash shuffle report. It is just unnecessary waiting

      2015-04-16 02:05:49,063 INFO org.apache.tajo.querymaster.Stage: Stage finalize - eb_1429088098190_1356_000001 (total=3, success=3, killed=0)
      2015-04-16 02:05:49,063 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: TaskScheduler schedulingThread stopped
      2015-04-16 02:05:49,064 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Task Scheduler stopped
      2015-04-16 02:05:49,064 INFO org.apache.tajo.querymaster.QueryMaster: cleanup executionBlocks: 
      2015-04-16 02:05:49,064 INFO org.apache.tajo.worker.TaskRunner: Received ShouldDie flag:eb_1429088098190_1356_000001,container_1429088098190_1356_01_058889
      2015-04-16 02:05:49,064 INFO org.apache.tajo.worker.TaskRunner: Stop TaskRunner: eb_1429088098190_1356_000001,container_1429088098190_1356_01_058889
      2015-04-16 02:05:49,064 INFO org.apache.tajo.worker.TaskRunnerManager: Stop Task:eb_1429088098190_1356_000001,container_1429088098190_1356_01_058889
      2015-04-16 02:05:49,065 INFO org.apache.tajo.querymaster.Stage: eb_1429088098190_1356_000001, waiting for shuffle reports. expected Tasks:3
      2015-04-16 02:05:49,066 INFO org.apache.tajo.worker.TaskRunnerManager: ======================== Processing eb_1429088098190_1356_000001 of type STOP
      2015-04-16 02:05:49,066 INFO org.apache.tajo.storage.HashShuffleAppenderManager: Close HashShuffleAppender:eb_1429088098190_1356_000001, not a hash shuffle
      2015-04-16 02:05:49,066 INFO org.apache.tajo.storage.HashShuffleAppenderManager: Close HashShuffleAppender:eb_1429088098190_1356_000001, not a hash shuffle
      2015-04-16 02:05:49,066 INFO org.apache.tajo.worker.TaskRunnerManager: Stopped execution block:eb_1429088098190_1356_000001
      2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Stage: eb_1429088098190_1356_000001, Received shuffle report: 2/3
      2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Stage: eb_1429088098190_1356_000001, Finalized shuffle reports: 3
      2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Stage: Stage completed - eb_1429088098190_1356_000001 (total=3, success=3, killed=0)
      2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Query: Processing q_1429088098190_1356 of type STAGE_COMPLETED
      2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Stage: eb_1429088098190_1356_000002, Outer volume: 0.0MB, Inner volume: 1.0MB
      2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Stage: eb_1429088098190_1356_000002, Bigger Table's volume is approximately 1 MB
      2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Stage: eb_1429088098190_1356_000002, The determined number of join partitions is 1
      2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Stage: org.apache.tajo.querymaster.DefaultTaskScheduler is chosen for the task scheduling for eb_1429088098190_1356_000002
      2015-04-16 02:05:49,066 INFO org.apache.tajo.querymaster.Query: Scheduling Stage:eb_1429088098190_1356_000002
      2015-04-16 02:05:49,068 INFO org.apache.tajo.storage.FileStorageManager: Total input paths to process : 11
      2015-04-16 02:05:49,068 ERROR org.apache.tajo.querymaster.Stage: Can't handle this event at current state, eventType:SQ_SHUFFLE_REPORT, oldState:SUCCEEDED, nextState:SUCCEEDED
      org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: SQ_SHUFFLE_REPORT at SUCCEEDED
      	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
      	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
      	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
      	at org.apache.tajo.querymaster.Stage.handle(Stage.java:743)
      	at org.apache.tajo.querymaster.QueryMasterTask$StageEventDispatcher.handle(QueryMasterTask.java:226)
      	at org.apache.tajo.querymaster.QueryMasterTask$StageEventDispatcher.handle(QueryMasterTask.java:220)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
      	at java.lang.Thread.run(Thread.java:745)
      2015-04-16 02:05:49,068 INFO org.apache.tajo.querymaster.QueryMaster: cleanup executionBlocks: 
      2015-04-16 02:05:49,069 INFO org.apache.tajo.querymaster.Query: Processing q_1429088098190_1356 of type STAGE_COMPLETED
      2015-04-16 02:05:49,069 INFO org.apache.tajo.querymaster.Query: Processing q_1429088098190_1356 of type QUERY_COMPLETED
      2015-04-16 02:05:49,069 INFO org.apache.tajo.querymaster.Query: q_1429088098190_1356 Query Transitioned from QUERY_RUNNING to QUERY_ERROR
      
      1. TAJO-1560-branch-0.10.1.patch
        26 kB
        Jinho Kim
      2. TAJO-1560.patch
        25 kB
        Jinho Kim

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Tajo-master-build #677 (See https://builds.apache.org/job/Tajo-master-build/677/)
          TAJO-1560: HashShuffle report should be ignored when a succeed tasks are not included. (jinho) (jhkim: rev 1f72d11f1d2bd48e895cbeb8a7228a854633fe2b)

          • tajo-core/src/main/java/org/apache/tajo/worker/TaskRunnerManager.java
          • tajo-core/src/main/java/org/apache/tajo/querymaster/Stage.java
          • tajo-core/src/main/java/org/apache/tajo/worker/ExecutionBlockContext.java
          • tajo-core/src/test/java/org/apache/tajo/querymaster/TestKillQuery.java
          • tajo-core/src/main/java/org/apache/tajo/master/TajoContainerProxy.java
          • tajo-core/src/main/proto/TajoWorkerProtocol.proto
          • CHANGES
          • tajo-core/src/main/java/org/apache/tajo/worker/TajoWorkerManagerService.java
          • tajo-core/src/main/java/org/apache/tajo/util/history/HistoryWriter.java
          • tajo-core/src/main/java/org/apache/tajo/worker/event/TaskRunnerStartEvent.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #677 (See https://builds.apache.org/job/Tajo-master-build/677/ ) TAJO-1560 : HashShuffle report should be ignored when a succeed tasks are not included. (jinho) (jhkim: rev 1f72d11f1d2bd48e895cbeb8a7228a854633fe2b) tajo-core/src/main/java/org/apache/tajo/worker/TaskRunnerManager.java tajo-core/src/main/java/org/apache/tajo/querymaster/Stage.java tajo-core/src/main/java/org/apache/tajo/worker/ExecutionBlockContext.java tajo-core/src/test/java/org/apache/tajo/querymaster/TestKillQuery.java tajo-core/src/main/java/org/apache/tajo/master/TajoContainerProxy.java tajo-core/src/main/proto/TajoWorkerProtocol.proto CHANGES tajo-core/src/main/java/org/apache/tajo/worker/TajoWorkerManagerService.java tajo-core/src/main/java/org/apache/tajo/util/history/HistoryWriter.java tajo-core/src/main/java/org/apache/tajo/worker/event/TaskRunnerStartEvent.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Tajo-master-CODEGEN-build #315 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/315/)
          TAJO-1560: HashShuffle report should be ignored when a succeed tasks are not included. (jinho) (jhkim: rev 1f72d11f1d2bd48e895cbeb8a7228a854633fe2b)

          • tajo-core/src/main/java/org/apache/tajo/util/history/HistoryWriter.java
          • tajo-core/src/main/java/org/apache/tajo/worker/event/TaskRunnerStartEvent.java
          • tajo-core/src/main/proto/TajoWorkerProtocol.proto
          • tajo-core/src/test/java/org/apache/tajo/querymaster/TestKillQuery.java
          • tajo-core/src/main/java/org/apache/tajo/master/TajoContainerProxy.java
          • CHANGES
          • tajo-core/src/main/java/org/apache/tajo/worker/ExecutionBlockContext.java
          • tajo-core/src/main/java/org/apache/tajo/worker/TaskRunnerManager.java
          • tajo-core/src/main/java/org/apache/tajo/worker/TajoWorkerManagerService.java
          • tajo-core/src/main/java/org/apache/tajo/querymaster/Stage.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-CODEGEN-build #315 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/315/ ) TAJO-1560 : HashShuffle report should be ignored when a succeed tasks are not included. (jinho) (jhkim: rev 1f72d11f1d2bd48e895cbeb8a7228a854633fe2b) tajo-core/src/main/java/org/apache/tajo/util/history/HistoryWriter.java tajo-core/src/main/java/org/apache/tajo/worker/event/TaskRunnerStartEvent.java tajo-core/src/main/proto/TajoWorkerProtocol.proto tajo-core/src/test/java/org/apache/tajo/querymaster/TestKillQuery.java tajo-core/src/main/java/org/apache/tajo/master/TajoContainerProxy.java CHANGES tajo-core/src/main/java/org/apache/tajo/worker/ExecutionBlockContext.java tajo-core/src/main/java/org/apache/tajo/worker/TaskRunnerManager.java tajo-core/src/main/java/org/apache/tajo/worker/TajoWorkerManagerService.java tajo-core/src/main/java/org/apache/tajo/querymaster/Stage.java
          Hide
          jhkim Jinho Kim added a comment -

          committed it
          Thanks

          Show
          jhkim Jinho Kim added a comment - committed it Thanks
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/tajo/pull/538

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/538
          Hide
          tajoqa Tajo QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12726430/TAJO-1560-branch-0.10.1.patch
          against master revision release-0.9.0-rc0-263-gad596bb.

          -1 patch. The patch command could not apply the patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/746//console

          This message is automatically generated.

          Show
          tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726430/TAJO-1560-branch-0.10.1.patch against master revision release-0.9.0-rc0-263-gad596bb. -1 patch. The patch command could not apply the patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/746//console This message is automatically generated.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jinossy commented on the pull request:

          https://github.com/apache/tajo/pull/538#issuecomment-94254947

          Thank you for your review.
          I will commit it soon after I upload patch for branch-0.10.1 in jira

          Show
          githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/538#issuecomment-94254947 Thank you for your review. I will commit it soon after I upload patch for branch-0.10.1 in jira
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on the pull request:

          https://github.com/apache/tajo/pull/538#issuecomment-94173533

          +1. I just a minor comment. Please address before commit.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/538#issuecomment-94173533 +1. I just a minor comment. Please address before commit.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/538#discussion_r28644583

          — Diff: tajo-core/src/main/java/org/apache/tajo/querymaster/Stage.java —
          @@ -1300,6 +1301,52 @@ protected void stopFinalization()

          { stopShuffleReceiver.set(true); }

          + private void finalizeShuffleReport(StageShuffleReportEvent event, ShuffleType type) {
          + if(!checkIfNeedFinalizing(type)) return;
          +
          + TajoWorkerProtocol.ExecutionBlockReport report = event.getReport();
          +
          + if (!report.getReportSuccess())

          { + stopFinalization(); + LOG.error(getId() + ", " + type + " report are failed. Caused by:" + report.getReportErrorMessage()); + eventHandler.handle(new StageEvent(getId(), StageEventType.SQ_FAILED)); + }

          +
          + completedShuffleTasks.addAndGet(report.getSucceededTasks());
          + if (report.getIntermediateEntriesCount() > 0) {
          + for (IntermediateEntryProto eachInterm : report.getIntermediateEntriesList())

          { + hashShuffleIntermediateEntries.add(new IntermediateEntry(eachInterm)); + }

          + }
          +
          + if (completedShuffleTasks.get() >= succeededObjectCount) {
          + LOG.info(getId() + ", Finalized " + type + " reports: " + completedShuffleTasks.get());
          + eventHandler.handle(new StageEvent(getId(), StageEventType.SQ_STAGE_COMPLETED));
          + if (timeoutChecker != null) {
          + stopFinalization();
          + synchronized (timeoutChecker)

          { + timeoutChecker.notifyAll(); + }

          + }
          + } else

          { + LOG.info(getId() + ", Received " + type + " reports " + + completedShuffleTasks.get() + "/" + succeededObjectCount); + }

          + }
          +
          + /**
          + * HASH_SHUFFLE, SCATTERED_HASH_SHUFFLE should get report from worker nodes when ExecutionBlock is stopping.
          — End diff –

          It would be great if you add a comment that describes why we don't need to collect reports when the shuffle type is RANGE_SHUFFLE.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/538#discussion_r28644583 — Diff: tajo-core/src/main/java/org/apache/tajo/querymaster/Stage.java — @@ -1300,6 +1301,52 @@ protected void stopFinalization() { stopShuffleReceiver.set(true); } + private void finalizeShuffleReport(StageShuffleReportEvent event, ShuffleType type) { + if(!checkIfNeedFinalizing(type)) return; + + TajoWorkerProtocol.ExecutionBlockReport report = event.getReport(); + + if (!report.getReportSuccess()) { + stopFinalization(); + LOG.error(getId() + ", " + type + " report are failed. Caused by:" + report.getReportErrorMessage()); + eventHandler.handle(new StageEvent(getId(), StageEventType.SQ_FAILED)); + } + + completedShuffleTasks.addAndGet(report.getSucceededTasks()); + if (report.getIntermediateEntriesCount() > 0) { + for (IntermediateEntryProto eachInterm : report.getIntermediateEntriesList()) { + hashShuffleIntermediateEntries.add(new IntermediateEntry(eachInterm)); + } + } + + if (completedShuffleTasks.get() >= succeededObjectCount) { + LOG.info(getId() + ", Finalized " + type + " reports: " + completedShuffleTasks.get()); + eventHandler.handle(new StageEvent(getId(), StageEventType.SQ_STAGE_COMPLETED)); + if (timeoutChecker != null) { + stopFinalization(); + synchronized (timeoutChecker) { + timeoutChecker.notifyAll(); + } + } + } else { + LOG.info(getId() + ", Received " + type + " reports " + + completedShuffleTasks.get() + "/" + succeededObjectCount); + } + } + + /** + * HASH_SHUFFLE, SCATTERED_HASH_SHUFFLE should get report from worker nodes when ExecutionBlock is stopping. — End diff – It would be great if you add a comment that describes why we don't need to collect reports when the shuffle type is RANGE_SHUFFLE.
          Hide
          tajoqa Tajo QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12726147/TAJO-1560.patch
          against master revision release-0.9.0-rc0-258-gd2a4f9b.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

          +1 checkstyle. The patch generated 0 code style errors.

          -1 findbugs. The patch appears to introduce 18 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in tajo-core:
          org.apache.tajo.client.TestTajoClient

          Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/740//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/740//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html
          Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/740//console

          This message is automatically generated.

          Show
          tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726147/TAJO-1560.patch against master revision release-0.9.0-rc0-258-gd2a4f9b. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to introduce 18 new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in tajo-core: org.apache.tajo.client.TestTajoClient Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/740//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/740//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/740//console This message is automatically generated.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user jinossy opened a pull request:

          https://github.com/apache/tajo/pull/538

          TAJO-1560: HashShuffle report should be ignored when a succeed tasks are not included

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/jinossy/tajo TAJO-1560

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/tajo/pull/538.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #538


          commit bb34d42799ab92495f92beaa741c90aa53396367
          Author: Jinho Kim <jhkim@apache.org>
          Date: 2015-04-17T09:34:28Z

          add shuffle type

          commit 24b291437fec9ed3796cd3ae57bd6142383cb51e
          Author: Jinho Kim <jhkim@apache.org>
          Date: 2015-04-17T12:27:32Z

          TAJO-1560: HashShuffle report should be ignored when a succeed tasks are not included


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user jinossy opened a pull request: https://github.com/apache/tajo/pull/538 TAJO-1560 : HashShuffle report should be ignored when a succeed tasks are not included You can merge this pull request into a Git repository by running: $ git pull https://github.com/jinossy/tajo TAJO-1560 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/538.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #538 commit bb34d42799ab92495f92beaa741c90aa53396367 Author: Jinho Kim <jhkim@apache.org> Date: 2015-04-17T09:34:28Z add shuffle type commit 24b291437fec9ed3796cd3ae57bd6142383cb51e Author: Jinho Kim <jhkim@apache.org> Date: 2015-04-17T12:27:32Z TAJO-1560 : HashShuffle report should be ignored when a succeed tasks are not included

            People

            • Assignee:
              jhkim Jinho Kim
              Reporter:
              jhkim Jinho Kim
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development