Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-1852

WordCount in local mode hangs in shuffle phase

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.4
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      WordCount hangs at the progress of 50%, it looks like something wrong in the shuffle phase. Prakash Ramachandran, could you take at look at this ? I remember you had fixed this issue before. But now it comes out again, I verify that it works in 0.5.1, but it doesn't work starting from 0.5.2.

      here's the log in when I run wordcount

      2014-12-15 12:19:33,874 INFO  [Initializer 1] runtime.LogicalIOProcessorRuntimeTask (LogicalIOProcessorRuntimeTask.java:call(434)) - Initialized Output with dest edge: Output
      2014-12-15 12:19:33,874 INFO  [TezChild] runtime.LogicalIOProcessorRuntimeTask (LogicalIOProcessorRuntimeTask.java:initialize(223)) - All initializers finished
      2014-12-15 12:19:33,874 INFO  [TezChild] resources.MemoryDistributor (MemoryDistributor.java:makeInitialAllocations(124)) - Using Allocator class: org.apache.tez.runtime.library.resources.WeightedScalingMemoryDistributor
      2014-12-15 12:19:33,874 INFO  [TezChild] resources.WeightedScalingMemoryDistributor (WeightedScalingMemoryDistributor.java:getRequestTypeForClass(197)) - Falling back to RequestType.OTHER for class: org.apache.tez.mapreduce.output.MROutput
      2014-12-15 12:19:33,874 INFO  [TezChild] resources.WeightedScalingMemoryDistributor (WeightedScalingMemoryDistributor.java:initialProcessMemoryRequestContext(171)) - ScaleFactor: 1, for type: OTHER
      2014-12-15 12:19:33,874 INFO  [TezChild] resources.WeightedScalingMemoryDistributor (WeightedScalingMemoryDistributor.java:initialProcessMemoryRequestContext(171)) - ScaleFactor: 12, for type: SORTED_MERGED_INPUT
      2014-12-15 12:19:33,875 INFO  [TezChild] resources.WeightedScalingMemoryDistributor (WeightedScalingMemoryDistributor.java:computeReservedFraction(262)) - InitialReservationFraction=0.3, AdditionalReservationFractionForIOs=0.03, finalReserveFractionUsed=0.32999999999999996
      2014-12-15 12:19:33,875 INFO  [TezChild] resources.WeightedScalingMemoryDistributor (WeightedScalingMemoryDistributor.java:assignMemory(126)) - Scaling Requests. NumRequests: 2, numScaledRequests: 13, TotalRequested: 859255552, TotalRequestedScaled: 7.931589710769231E8, TotalJVMHeap: 954728448, TotalAvailable: 639668060, TotalRequested/TotalJVMHeap:0.90
      2014-12-15 12:19:33,875 INFO  [TezChild] resources.MemoryDistributor (MemoryDistributor.java:makeInitialAllocations(138)) - Informing: OUTPUT, Output, org.apache.tez.mapreduce.output.MROutput: requested=0, allocated=0
      2014-12-15 12:19:33,875 INFO  [TezChild] resources.MemoryDistributor (MemoryDistributor.java:makeInitialAllocations(138)) - Informing: INPUT, Tokenizer, org.apache.tez.runtime.library.input.OrderedGroupedKVInput: requested=859255552, allocated=639668060
      2014-12-15 12:19:33,875 INFO  [TezChild] runtime.LogicalIOProcessorRuntimeTask (LogicalIOProcessorRuntimeTask.java:initialize(245)) - Starting Inputs/Outputs
      2014-12-15 12:19:33,875 INFO  [TezChild] runtime.LogicalIOProcessorRuntimeTask (LogicalIOProcessorRuntimeTask.java:initialize(258)) - Input: Tokenizer being auto started by the framework. Subsequent instances will not be auto-started
      2014-12-15 12:19:33,875 INFO  [Initializer 0] runtime.LogicalIOProcessorRuntimeTask (LogicalIOProcessorRuntimeTask.java:call(402)) - Starting Input with src edge: Tokenizer
      2014-12-15 12:19:33,875 INFO  [TezChild] runtime.LogicalIOProcessorRuntimeTask (LogicalIOProcessorRuntimeTask.java:initialize(279)) - Num IOs determined for AutoStart: 1
      2014-12-15 12:19:33,875 INFO  [TezChild] runtime.LogicalIOProcessorRuntimeTask (LogicalIOProcessorRuntimeTask.java:initialize(281)) - Waiting for 1 IOs to start
      2014-12-15 12:19:33,882 INFO  [Initializer 0] common.TezRuntimeUtils (TezRuntimeUtils.java:instantiateCombiner(65)) - No combiner specified via tez.runtime.combiner.class. Combiner will not be used
      2014-12-15 12:19:33,883 INFO  [Initializer 0] orderedgrouped.Shuffle (Shuffle.java:<init>(179)) - Shuffle assigned with 1 inputs, codec: NoneifileReadAhead: true
      2014-12-15 12:19:33,886 INFO  [Initializer 0] orderedgrouped.ShuffleScheduler (ShuffleScheduler.java:<init>(150)) - ShuffleScheduler running for sourceVertex: Tokenizer with configuration: maxFetchFailuresBeforeReporting=5, reportReadErrorImmediately=true, maxFailedUniqueFetches=1, abortFailureLimit=30, maxMapRuntime=0
      2014-12-15 12:19:33,887 INFO  [Initializer 0] orderedgrouped.MergeManager (MergeManager.java:<init>(221)) - InitialRequest: ShuffleMem=859255552, postMergeMem=0, RuntimeTotalAvailable=639668060. Updated to: ShuffleMem=639668060, postMergeMem: 0
      2014-12-15 12:19:33,887 INFO  [Initializer 0] orderedgrouped.MergeManager (MergeManager.java:<init>(252)) - MergerManager: memoryLimit=639668060, maxSingleShuffleLimit=159917008, mergeThreshold=575701184, ioSortFactor=100, memToMemMergeOutputsThreshold=100
      2014-12-15 12:19:33,890 INFO  [Initializer 0] orderedgrouped.Shuffle (Shuffle.java:<init>(226)) - Num fetchers being started: 1
      2014-12-15 12:19:33,891 INFO  [Initializer 0] runtime.LogicalIOProcessorRuntimeTask (LogicalIOProcessorRuntimeTask.java:call(404)) - Started Input with src edge: Tokenizer
      2014-12-15 12:19:33,891 INFO  [TezChild] runtime.LogicalIOProcessorRuntimeTask (LogicalIOProcessorRuntimeTask.java:initialize(294)) - AutoStartComplete
      2014-12-15 12:19:33,891 INFO  [TezChild] task.TezTaskRunner (TezTaskRunner.java:run(175)) - Running task, taskAttemptId=attempt_1418617172198_0001_1_01_000000_0
      2014-12-15 12:19:33,891 INFO  [TezTaskEventRouter[attempt_1418617172198_0001_1_01_000000_0]] orderedgrouped.ShuffleInputEventHandlerOrderedGrouped (ShuffleInputEventHandlerOrderedGrouped.java:processDataMovementEvent(83)) - DME srcIdx: 0, targetIdx: 0, attemptNum: 0, payload: [host: jzhangMBPr.local, port: 0, pathComponent: attempt_1418617172198_0001_1_00_000000_0_10003, runDuration: 327458, ]
      2014-12-15 12:19:34,311 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - DAG: State: RUNNING Progress: 50% TotalTasks: 2 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
      2014-12-15 12:19:34,312 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - 	VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
      2014-12-15 12:19:34,313 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - 	VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
      2014-12-15 12:19:39,330 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - DAG: State: RUNNING Progress: 50% TotalTasks: 2 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
      2014-12-15 12:19:39,331 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - 	VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
      2014-12-15 12:19:39,331 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - 	VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
      2014-12-15 12:19:44,348 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - DAG: State: RUNNING Progress: 50% TotalTasks: 2 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
      2014-12-15 12:19:44,349 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - 	VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
      2014-12-15 12:19:44,350 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - 	VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
      2014-12-15 12:19:49,365 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - DAG: State: RUNNING Progress: 50% TotalTasks: 2 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
      2014-12-15 12:19:49,367 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - 	VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
      2014-12-15 12:19:49,369 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - 	VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
      2014-12-15 12:19:54,388 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - DAG: State: RUNNING Progress: 50% TotalTasks: 2 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
      2014-12-15 12:19:54,390 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - 	VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
      2014-12-15 12:19:54,392 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - 	VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
      2014-12-15 12:19:59,406 INFO  [main] client.DAGClientImpl (DAGClientImpl.java:log(463)) - DAG: State: RUNNING Progress: 50% TotalTasks: 2 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
      

      You can reproduce this by adding these 3 lines of code in WordCount

          tezConf.setBoolean(TezConfiguration.TEZ_LOCAL_MODE, true);
          tezConf.set("fs.defaultFS", "file:///");
          tezConf.setBoolean(TezRuntimeConfiguration.TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH, true);
      
      1. TEZ-1852.1.patch
        7 kB
        Prakash Ramachandran

        Issue Links

          Activity

          Hide
          pramachandran Prakash Ramachandran added a comment -

          The OrderedPartitionedKVEdgeConfig will not pick up this config and hence wont be set in the userpayload. See TEZ-1587 for details. Since setFromConfiguration seemed to be unnecessary it was removed in TEZ-1614. would need to look at how to pass the conf values to the OrderedPartitionedKVEdgeConfig better. Siddharth Seth / Bikas Saha any thoughts on how to proceed with this one?

          Show
          pramachandran Prakash Ramachandran added a comment - The OrderedPartitionedKVEdgeConfig will not pick up this config and hence wont be set in the userpayload. See TEZ-1587 for details. Since setFromConfiguration seemed to be unnecessary it was removed in TEZ-1614 . would need to look at how to pass the conf values to the OrderedPartitionedKVEdgeConfig better. Siddharth Seth / Bikas Saha any thoughts on how to proceed with this one?
          Hide
          sseth Siddharth Seth added a comment -

          setFromConfiguration would need to be re-instated to get this to work. The edges would not get the appropriate configs otherwise. Does it hang, or eventually fail ?

          Show
          sseth Siddharth Seth added a comment - setFromConfiguration would need to be re-instated to get this to work. The edges would not get the appropriate configs otherwise. Does it hang, or eventually fail ?
          Hide
          pramachandran Prakash Ramachandran added a comment -

          it will eventually fail with shuffle reporting fetch failure.

          Show
          pramachandran Prakash Ramachandran added a comment - it will eventually fail with shuffle reporting fetch failure.
          Hide
          hitesh Hitesh Shah added a comment - - edited

          Siddharth Seth Prakash Ramachandran Looks like this needs to be backported to 0.5.x?

          Jeff Zhang Does this still occur if TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH is not set to true?

          Show
          hitesh Hitesh Shah added a comment - - edited Siddharth Seth Prakash Ramachandran Looks like this needs to be backported to 0.5.x? Jeff Zhang Does this still occur if TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH is not set to true?
          Hide
          zjffdu Jeff Zhang added a comment -

          Hitesh Shah, it is the same if TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH is not set to true

          Show
          zjffdu Jeff Zhang added a comment - Hitesh Shah , it is the same if TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH is not set to true
          Hide
          pramachandran Prakash Ramachandran added a comment -
          • added setFromConfiguration calls on edge config.
          Show
          pramachandran Prakash Ramachandran added a comment - added setFromConfiguration calls on edge config.
          Hide
          pramachandran Prakash Ramachandran added a comment - - edited

          Hitesh Shah if TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH is not set to true, the fetcher will try to use the http connect which will fail for local mode.

          will add the patch for 0.5.x once the patch is reviewed.

          Show
          pramachandran Prakash Ramachandran added a comment - - edited Hitesh Shah if TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH is not set to true, the fetcher will try to use the http connect which will fail for local mode. will add the patch for 0.5.x once the patch is reviewed.
          Hide
          sseth Siddharth Seth added a comment -

          +1. Looks good to me.

          Show
          sseth Siddharth Seth added a comment - +1. Looks good to me.
          Hide
          sseth Siddharth Seth added a comment -

          Committed to master and branch-0.5. Thanks Prakash Ramachandran

          Show
          sseth Siddharth Seth added a comment - Committed to master and branch-0.5. Thanks Prakash Ramachandran
          Hide
          hitesh Hitesh Shah added a comment -

          Closing issue as 0.5.4, 0.6.1 and 0.7.0 have been released.

          Show
          hitesh Hitesh Shah added a comment - Closing issue as 0.5.4, 0.6.1 and 0.7.0 have been released.

            People

            • Assignee:
              pramachandran Prakash Ramachandran
              Reporter:
              zjffdu Jeff Zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development