Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-1852

WordCount in local mode hangs in shuffle phase

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.4
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      WordCount hangs at the progress of 50%, it looks like something wrong in the shuffle phase. Prakash Ramachandran, could you take at look at this ? I remember you had fixed this issue before. But now it comes out again, I verify that it works in 0.5.1, but it doesn't work starting from 0.5.2.

      here's the log in when I run wordcount

      2014-12-15 12:19:33,874 INFO  [Initializer 1] runtime.LogicalIOProcessorRuntimeTask (LogicalIOProcessorRuntimeTask.java:call(434)) - Initialized Output with dest edge: Output
      2014-12-15 12:19:33,874 INFO  [TezChild] runtime.LogicalIOProcessorRuntimeTask (LogicalIOProcessorRuntimeTask.java:initialize(223)) - All initializers finished
      2014-12-15 12:19:33,874 INFO  [TezChild] resources.MemoryDistributor (MemoryDistributor.java:makeInitialAllocations(124)) - Using Allocator class: org.apache.tez.runtime.library.resources.WeightedScalingMemoryDistributor
      2014-12-15 12:19:33,874 INFO  [TezChild] resources.WeightedScalingMemoryDistributor (WeightedScalingMemoryDistributor.java:getRequestTypeForClass(197)) - Falling back to RequestType.OTHER for class: org.apache.tez.mapreduce.output.MROutput
      2014-12-15 12:19:33,874 INFO  [TezChild] resources.WeightedScalingMemoryDistributor (WeightedScalingMemoryDistributor.java:initialProcessMemoryRequestContext(171)) - ScaleFactor: 1, for type: OTHER
      2014-12-15 12:19:33,874 INFO  [TezChild] resources.WeightedScalingMemoryDistributor (WeightedScalingMemoryDistributor.java:initialProcessMemoryRequestContext(171)) - ScaleFactor: 12, for type: SORTED_MERGED_INPUT
      2014-12-15 12:19:33,875 INFO  [TezChild] resources.WeightedScalingMemoryDistributor (WeightedScalingMemoryDistributor.java:computeReservedFraction(262)) - InitialReservationFraction=0.3, AdditionalReservationFractionForIOs=0.03, finalReserveFractionUsed=0.32999999999999996
      2014-12-15 12:19:33,875 INFO  [TezChild] resources.WeightedScalingMemoryDistributor (WeightedScalingMemoryDistributor.java:assignMemory(126)) - Scaling Requests. NumRequests: 2, numScaledRequests: 13, TotalRequested: 859255552, TotalRequestedScaled: 7.931589710769231E8, TotalJVMHeap: 954728448, TotalAvailable: 639668060, TotalRequested/TotalJVMHeap:0.90
      2014-12-15 12:19:33,875 INFO  [TezChild] resources.MemoryDistributor (MemoryDistributor.java:makeInitialAllocations(138)) - Informing: OUTPUT, Output, org.apache.tez.mapreduce.output.MROutput: requested=0, allocated=0
      2014-12-15 12:19:33,875 INFO  [TezChild] resources.MemoryDistributor (MemoryDistributor.java:makeInitialAllocations(138)) - Informing: INPUT, Tokenizer, org.apache.tez.runtime.library.input.OrderedGroupedKVInput: requested=859255552, allocated=639668060
      2014-12-15 12:19:33,875 INFO  [TezChild] runtime.LogicalIOProcessorRuntimeTask (LogicalIOProcessorRuntimeTask.java:initialize(245)) - Starting Inputs/Outputs
      2014-12-15 12:19:33,875 INFO  [TezChild] runtime.LogicalIOProcessorRuntimeTask (LogicalIOProcessorRuntimeTask.java:initialize(258)) - Input: Tokenizer being auto started by the framework. Subsequent instances will not be auto-started
      2014-12-15 12:19:33,875 INFO  [Initializer 0] runtime.LogicalIOProcessorRuntimeTask (LogicalIOProcessorRuntimeTask.java:call(402)) - Starting Input with src edge: Tokenizer
      2014-12-15 12:19:33,875 INFO  [TezChild] runtime.LogicalIOProcessorRuntimeTask (LogicalIOProcessorRuntimeTask.java:initialize(279)) - Num IOs determined for AutoStart: 1
      2014-12-15 12:19:33,875 INFO  [TezChild] runtime.LogicalIOProcessorRuntimeTask (LogicalIOProcessorRuntimeTask.java:initialize(281)) - Waiting for 1 IOs to start
      2014-12-15 12:19:33,882 INFO  [Initializer 0] common.TezRuntimeUtils (TezRuntimeUtils.java:instantiateCombiner(65)) - No combiner specified via tez.runtime.combiner.class. Combiner will not be used
      2014-12-15 12:19:33,883 INFO  [Initializer 0] orderedgrouped.Shuffle (Shuffle.java:<init>(179)) - Shuffle assigned with 1 inputs, codec: NoneifileReadAhead: true
      2014-12-15 12:19:33,886 INFO  [Initializer 0] orderedgrouped.ShuffleScheduler (ShuffleScheduler.java:<init>(150)) - ShuffleScheduler running for sourceVertex: Tokenizer with configuration: maxFetchFailuresBeforeReporting=5, reportReadErrorImmediately=true, maxFailedUniqueFetches=1, abortFailureLimit=30, maxMapRuntime=0
      2014-12-15 12:19:33,887 INFO  [Initializer 0] orderedgrouped.MergeManager (MergeManager.java:<init>(221)) - InitialRequest: ShuffleMem=859255552, postMergeMem=0, RuntimeTotalAvailable=639668060. Updated to: ShuffleMem=639668060, postMergeMem: 0
      2014-12-15 12:19:33,887 INFO  [Initializer 0] orderedgrouped.MergeManager (MergeManager.java:<init>(252)) - MergerManager: memoryLimit=639668060, maxSingleShuffleLimit=159917008, mergeThreshold=575701184, ioSortFactor=100, memToMemMergeOutputsThreshold=100
      2014-12-15 12:19:33,890 INFO  [Initializer 0] orderedgrouped.Shuffle (Shuffle.java:<init>(226)) - Num fetchers being started: 1
      2014-12-15 12:19:33,891 INFO  [Initializer 0] runtime.LogicalIOProcessorRuntimeTask (LogicalIOProcessorRuntimeTask.java:call(404)) - Started Input with src edge: Tokenizer
      2014-12-15 12:19:33,891 INFO  [TezChild] runtime.LogicalIOProcessorRuntimeTask (LogicalIOProcessorRuntimeTask.java:initialize(294)) - AutoStartComplete
      2014-12-15 12:19:33,891 INFO  [TezChild] task.TezTaskRunner (TezTaskRunner.java:run(175)) - Running task, taskAttemptId=attempt_1418617172198_0001_1_01_000000_0
      2014-12-15 12:19:33,891 INFO  [TezTaskEventRouter[attempt_1418617172198_0001_1_01_000000_0]] orderedgrouped.ShuffleInputEventHandlerOrderedGrouped (ShuffleInputEventHandlerOrderedGrouped.java:processDataMovementEvent(83)) - DME srcIdx: 0, targetIdx: 0, attemptNum: 0, payload: [host: jzhangMBPr.local, port: 0, pathComponent: attempt_1418617172198_0001_1_00_000000_0_10003, runDuration: 327458, ]
      2014-12-15 12:19:34,311 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - DAG: State: RUNNING Progress: 50% TotalTasks: 2 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
      2014-12-15 12:19:34,312 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - 	VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
      2014-12-15 12:19:34,313 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - 	VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
      2014-12-15 12:19:39,330 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - DAG: State: RUNNING Progress: 50% TotalTasks: 2 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
      2014-12-15 12:19:39,331 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - 	VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
      2014-12-15 12:19:39,331 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - 	VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
      2014-12-15 12:19:44,348 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - DAG: State: RUNNING Progress: 50% TotalTasks: 2 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
      2014-12-15 12:19:44,349 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - 	VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
      2014-12-15 12:19:44,350 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - 	VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
      2014-12-15 12:19:49,365 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - DAG: State: RUNNING Progress: 50% TotalTasks: 2 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
      2014-12-15 12:19:49,367 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - 	VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
      2014-12-15 12:19:49,369 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - 	VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
      2014-12-15 12:19:54,388 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - DAG: State: RUNNING Progress: 50% TotalTasks: 2 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
      2014-12-15 12:19:54,390 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - 	VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
      2014-12-15 12:19:54,392 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - 	VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
      2014-12-15 12:19:59,406 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - DAG: State: RUNNING Progress: 50% TotalTasks: 2 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
      

      You can reproduce this by adding these 3 lines of code in WordCount

          tezConf.setBoolean(TezConfiguration.TEZ_LOCAL_MODE, true);
          tezConf.set("fs.defaultFS", "file:///");
          tezConf.setBoolean(TezRuntimeConfiguration.TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH, true);
      

        Attachments

        1. TEZ-1852.1.patch
          7 kB
          Prakash Ramachandran

          Issue Links

            Activity

              People

              • Assignee:
                pramachandran Prakash Ramachandran
                Reporter:
                zjffdu Jeff Zhang
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: