Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-3479

DAG AM does not schedule any more containers in corner cases

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.7.1
    • None
    • None
    • None

    Description

      Env: 3 node AWS cluster with data residing in S3. Tez version is 0.7.

      Some workloads end up generating lots of data that the tasks start throwing "No space available" in local disks (e.g Q29 in TPCDS). DAG should fail after enough number of retries which happens most of the time. Once in a while (~ once in 20-30 runs), DAG AM gets into hung state and does not schedule any more containers for the failed task attempts. Will attach the logs shortly.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rajesh.balamohan Rajesh Balamohan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: