Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9934

LogAggregationService should not submit aggregator when app dir creation fail

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: log-aggregation
    • Labels:
      None
    • Flags:
      Patch

      Description

      Before submiting a log aggreation runnable, LogAggregationService  will try to create the aggreated log dir.

      In some case, it may fail(e.g dir num exceed max limit)

       

      When it did failed and submitted to LogAggregationService, the runnable may run forever if some app statue flip misbehavior(e.g not handling application complete event rightfully,thus keeping appFinishing of AppLogAggregatorImpl be always true).

       

      In our production(Version 2.7.3), this cause huge number of dangling aggregator(~400+ LogAggregationService threads alive for some node, in which nodemanager configured only 50+ vCPUs).

       

      The patch try to early throw the creation exception, avoiding starting unnecessary log polling. 

        Attachments

        1. YARN-9934.patch
          2 kB
          Zizon
        2. YARN-9934.patch.1
          2 kB
          Zizon

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                zizon Zizon
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: