Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-11114

RMWebServices returns only apps matching exactly the submitted queue name



    • Reviewed


      I've added 2 testcases that demonstrate the issue with this commit.

      1. With 'testAppsQueryByQueueShortname', there's a finishedApp submitted to "root.default" and there's a runningApp that is submitted to "default".
      The testcase queries the apps by queue name "default" and the response only contains the runningApp, which is submitted to "default" so the other app that is submitted to "root.default" is not returned.

      2. With 'testAppsQueryByQueueFullname', there's a finishedApp submitted to "root.default" and there's a runningApp that is submitted to "default" (same setup as above).
      The testcase queries the apps by queue name "root.default" (which is the full queue path) and the response only contains the finishedApp, which is submittted to "root.default" so the other app that is submitted to "default" is not returned.

      A trivial conclusion of this is that only those applications are included in the response that exactly match the queue name where the application is submitted to, either specified explicity at submission or resolved by the placement engine.

      Before YARN-9879 was implemented, Capacity Scheduler was only capable of definining a leaf queue with a specific name in the whole hierarchy once, meaning that leaf queue names were unique.
      For example root.a.testQueue and root.b.testQueue couldn't coexist, as the leaf queue name is the same.

      At this point, I supposed that YARN-9879 is causing this issue, but as the behaviour of CS before YARN-9879 was merged didn't allow two leaf queues with the same name, a query of "root.default" and "default" could easily work as it was guaranteed that there's not another "default" leaf queue in the hierarchy, just one. I digged a bit further.

      I also noticed that YARN-8659 (commit link) could have introduced this issue a long time ago, as it removed the iterator logic that queried the applications with method YarnScheduler#getAppsInQueue (see this).

      Let's follow the implementation of YarnScheduler#getAppsInQueue for CS:
      1. First of all, here is the method definition.
      CapacityScheduler#getQueue is called from here.

      2. CapacityScheduler#getQueue is then calling QueueManager#getQueue.

      3. QueueManager#getQueue is then calling CSQueueStore#get.

      4. CSQueueStore#get calls the 'getMap' fields getOrDefault method here.

      4.1 CSQueueStore#getMap (field) stores the Queue objects mapped to their short and full names (e.g. 'default' and 'root.default').
      CSQueueStore#add is the method that is responsible for adding the CSQueue objects.

      4.2 The first getMap.put call is invoked here with the full queue name.

      4.3 The second getMap.put call is invoked via CSQueueStore#updateGetMapForShortName here.

      As a conclusion, in ClientRMService#getApplications, the app filtering by queues seems wrong for me.
      The block that filters by queues is here.

      This should be enhanced by querying the apps from YarnScheduler#getAppsInQueue, as it both handles the short and full queue names for CS in the end.
      It's crucial to not just fall back to the logic that was replaced by YARN-8659 (commit link).
      As the original issue was there that rmContext.getRMApps() returns both running and finished apps, while scheduler.getAppsInQueue only returns running apps.


      NOTE #1: 
      As there's no way to get the short queue name + the full queue name from RmApp / RmAppImpl, it's currently not possible to compare the queue filter of the RM client request with both type of queue names of the application.

      NOTE #2:
      scheduler.getAppsInQueue(queue) will only return running apps, so for running apps, it's possible to retrieve the apps by queue name, and it will work with both short and full names. However, for non-running apps, only the submitted app name would work for filtering.

      NOTE #3 (plan for implementation):
      It would be completely reasonable to consider both running and non-running apps while querying, however I think it never worked that way.
      Before YARN-8659, only running apps were considered and before YARN-9879, both running + non-running apps were considered but only the stored queue name (in RmAppImpl) was compared to the app filter's queue name, which was either the short or the full queue name.
      All in all, I don't want to change this behavior and also I think it would make the code more convoluted if RmAppImpl would store the short and the full queue names as well.


        Issue Links



              snemeth Szilard Nemeth
              snemeth Szilard Nemeth
              0 Vote for this issue
              1 Start watching this issue



                Time Tracking

                  Original Estimate - Not Specified
                  Not Specified
                  Remaining Estimate - 0h
                  Time Spent - 1h 50m
                  1h 50m