Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-15 Support for DAG AM recovery
  3. TEZ-2544

Incorrect dag result due to wrong TaskSpec in recovering

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Target Version/s:

      Description

      Expected TaskSpec

      DAGName : OrderedWordCount, VertexName: Summation, VertexParallelism: 1, TaskAttemptID:attempt_1433850314856_0019_1_01_000000_0, processorName=org.apache.tez.examples.OrderedWordCount$SumProcessor, inputSpecListSize=1, 
      outputSpecListSize=1, inputSpecList=[{{ sourceVertexName=Tokenizer, physicalEdgeCount=2, inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput }}, ], outputSpecList=[{{ destinationVertexName=Sorter, physicalEdgeCount=1, outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput }}
      

      The actual TaskSpec

      DAGName : OrderedWordCount, VertexName: Summation, VertexParallelism: 1, TaskAttemptID:attempt_1433850314856_0019_1_01_000000_0, processorName=org.apache.tez.examples.OrderedWordCount$SumProcessor, inputSpecListSize=1, 
      outputSpecListSize=1, inputSpecList=[{{ sourceVertexName=Tokenizer, physicalEdgeCount=1, inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput }}, ], outputSpecList=[{{ destinationVertexName=Sorter, physicalEdgeCount=1, outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput }}
      

      The expected physicalEdgeCount is 2 but actually it is 1, it happens when dynamic parallelism estimation is enabled.

      The cause is that Task is recovering but its vertex's source edge manager has not been updated from ScatterGatherEdgeManager to CustomShuffleEdgeManager, so will result in different physicalEdgeCount for InputSpec

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                zjffdu Jeff Zhang
                Reporter:
                zjffdu Jeff Zhang
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: