Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9934

LogAggregationService should not submit aggregator when app dir creation fail

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • None
    • None
    • log-aggregation
    • None
    • Patch

    Description

      Before submiting a log aggreation runnable, LogAggregationService  will try to create the aggreated log dir.

      In some case, it may fail(e.g dir num exceed max limit)

       

      When it did failed and submitted to LogAggregationService, the runnable may run forever if some app statue flip misbehavior(e.g not handling application complete event rightfully,thus keeping appFinishing of AppLogAggregatorImpl be always true).

       

      In our production(Version 2.7.3), this cause huge number of dangling aggregator(~400+ LogAggregationService threads alive for some node, in which nodemanager configured only 50+ vCPUs).

       

      The patch try to early throw the creation exception, avoiding starting unnecessary log polling. 

      Attachments

        1. YARN-9934.patch
          2 kB
          Zizon
        2. YARN-9934.patch.1
          2 kB
          Zizon

        Issue Links

          Activity

            People

              Unassigned Unassigned
              zizon Zizon
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: