Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-11152

QueueMetrics is leaking memory when creating a new queue during reinitialisation

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      Capacity Scheduler handles reinitialisation by reparsing the entire queue hierarchy, then reinitialising the old queue hierarchy by taking the newly parsed queues into account. After this, the newly parsed queues are discarded and they are GCed.
      However, with the introduction of YARN-6492, we are storing a parent queue in QueueMetrics, which is problematic, because at that point, the parent queue could still point to a parent reference, that is a newly parsed parent queue (which should be discarded after the reinitialisation). Due to this fact, QueueMetrics could contain parents members of an entirely different queue hierarchy than the current hierarchy in use. It could lead to subtle problems as well as memory leak, because one parent reference will keep the whole queue hierarchy alive.
      This problem arised when we programatically added one queue after an other via the mutation API, thus keeping alive hundreds of queue hierarchies at the same time, crippling the GC and the whole RM.

      Attachments

        Issue Links

          Activity

            People

              quapaw András Győri
              quapaw András Győri
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h