Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9686

Reduce visibility of blacklisted nodes information (only for current app attempt) to avoid the abuse of memory

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • resourcemanager
    • None

    Description

      Recently we found an issue that RM did a long GC and found many WARN logs(Ignoring Blacklists, blacklist size 1775 is more than failure threshold ratio 0.20000000298023224 out of total usable nodes 1778) in RM log with a super high frequency about 3w+/s.
      The direct cause is that a few apps with a large attempts and many blacklisted nodes were requested frequently via REST API or WEB UI. For every single request, RM should allocate new memory for blacklisted nodes for many times(N * NUM_ATTETMPTS).

      Currently both AM(system) blacklisted nodes and app blacklisted nodes are transferred among app attempts and there are only one instance for each other, it's redundant and costly to travel all blacklisted nodes for every app attempt, so that I propose to get and show blacklisted nodes only for current app attempt to enhance performance and avoid the abuse of memory in some similar scenarios.

      Attachments

        1. YARN-9686.001.patch
          7 kB
          Tao Yang

        Activity

          People

            Tao Yang Tao Yang
            Tao Yang Tao Yang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: