Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-10557

Application may be leaked in state store when resourcemanager failover.

    XMLWordPrintableJSON

    Details

      Description

      In resourceManager log, I found amount of log like below:

      2020-12-30 19:18:48,120 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Max number of completed apps kept in state store met: maxCompletedAppsInStateStore = 2000, but not removing app application_1608912003714_0098 from state store as log aggregation have not finished yet.
      

      When I search this, I found the application has already log aggerated. When I debug this, I found the app's logAggregationStatusForAppReport is NOT_START. (Note: In my test cluster, I simulate restart rm occasionally)

      If the application is finished and log aggerated, but not removed from rm. When rm failover, the new rm will recover from state store (you know log aggregation is not stored, so can't remove it), but logAggregationStatusForAppReport will not be updated. So logAggregationStatusForAppReport keep NOT_START. Then the app will not be removed from statestore.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                zhengchenyu zhengchenyu
                Reporter:
                zhengchenyu zhengchenyu
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: