Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4911

Provide option to disable DAG recovery

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.17.0
    • None
    • None
    • Reviewed

    Description

      Tez 0.7 has lot of issues with DAG recovery with auto parallelism causing hung dags in many cases as it was not writing auto parallelism decisions to recovery history. Rewrite was done in Tez 0.8 to handle that.
      Code was added to Tez to automatically disable recovery if there was auto parallelism so that it would benefit both Pig and Tez. It works fine and the second AM attempt fails with DAG cannot be recovered error when it sees there are vertices with auto parallelism. But problem is it is hard to see what the actual problem is for the users and is hard to debug as well as the whole UI state is rewritten with the partial recovery information.
      Doing the disabling of recovery in Pig itself by setting tez.dag.recovery.enabled=false will make it not go for the second attempt at all which will eventually fail. It also makes it easy to debug the original failure.

      Attachments

        1. PIG-4911-1.patch
          9 kB
          Rohini Palaniswamy
        2. PIG-4911-2.patch
          20 kB
          Rohini Palaniswamy

        Activity

          People

            rohini Rohini Palaniswamy
            rohini Rohini Palaniswamy
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: