Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9640

Slow event processing could cause too many attempt unregister events

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 3.3.0, 3.2.1
    • None
    • Reviewed

    Description

      We found in one of our test cluster verification that the number attempt unregister events is about 300k+.

      1. AM all containers completed.
      2. AMRMClientImpl send finishApplcationMaster
      3. AMRMClient check event 100ms the finish Status using finishApplicationMaster request.
      4. AMRMClientImpl#unregisterApplicationMaster
              while (true) {
                FinishApplicationMasterResponse response =
                    rmClient.finishApplicationMaster(request);
                if (response.getIsUnregistered()) {
                  break;
                }
                LOG.info("Waiting for application to be successfully unregistered.");
                Thread.sleep(100);
              }
        
      1. ApplicationMasterService finishApplicationMaster interface sends unregister events on every status update.

      We should send unregister event only once and cache event send , ignore and send not unregistered response back to AM not overloading the event queue.

      Attachments

        1. YARN-9640-branch-3.2.001.patch
          6 kB
          Bibin Chundatt
        2. YARN-9640.003.patch
          5 kB
          Bibin Chundatt
        3. YARN-9640.002.patch
          6 kB
          Bibin Chundatt
        4. YARN-9640.001.patch
          6 kB
          Bibin Chundatt

        Activity

          People

            bibinchundatt Bibin Chundatt
            bibinchundatt Bibin Chundatt
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: