Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4381

Optimize container metrics in NodeManagerMetrics

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 2.7.1
    • None
    • metrics, nodemanager

    Description

      Recently, I found a issue on nodemanager metrics.That's NodeManagerMetrics#containersLaunched is not actually means the container succeed launched times.Because in some time, it will be failed when receiving the killing command or happening container-localizationFailed.This will lead to a failed container.But now,this counter value will be increased in these code whenever the container is started successfully or failed.

      Credentials credentials = parseCredentials(launchContext);
      
          Container container =
              new ContainerImpl(getConfig(), this.dispatcher,
                  context.getNMStateStore(), launchContext,
                credentials, metrics, containerTokenIdentifier);
          ApplicationId applicationID =
              containerId.getApplicationAttemptId().getApplicationId();
          if (context.getContainers().putIfAbsent(containerId, container) != null) {
            NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
              "ContainerManagerImpl", "Container already running on this node!",
              applicationID, containerId);
            throw RPCUtil.getRemoteException("Container " + containerIdStr
                + " already is running on this node!!");
          }
      
          this.readLock.lock();
          try {
            if (!serviceStopped) {
              // Create the application
              Application application =
                  new ApplicationImpl(dispatcher, user, applicationID, credentials, context);
              if (null == context.getApplications().putIfAbsent(applicationID,
                application)) {
                LOG.info("Creating a new application reference for app " + applicationID);
                LogAggregationContext logAggregationContext =
                    containerTokenIdentifier.getLogAggregationContext();
                Map<ApplicationAccessType, String> appAcls =
                    container.getLaunchContext().getApplicationACLs();
                context.getNMStateStore().storeApplication(applicationID,
                    buildAppProto(applicationID, user, credentials, appAcls,
                      logAggregationContext));
                dispatcher.getEventHandler().handle(
                  new ApplicationInitEvent(applicationID, appAcls,
                    logAggregationContext));
              }
      
              this.context.getNMStateStore().storeContainer(containerId, request);
              dispatcher.getEventHandler().handle(
                new ApplicationContainerInitEvent(container));
      
              this.context.getContainerTokenSecretManager().startContainerSuccessful(
                containerTokenIdentifier);
              NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
                "ContainerManageImpl", applicationID, containerId);
              // TODO launchedContainer misplaced -> doesn't necessarily mean a container
              // launch. A finished Application will not launch containers.
              metrics.launchedContainer();
              metrics.allocateContainer(containerTokenIdentifier.getResource());
            } else {
              throw new YarnException(
                  "Container start failed as the NodeManager is " +
                  "in the process of shutting down");
            }
      

      In addition, we are lack of localzationFailed metric in container.

      Attachments

        1. YARN-4381.003.patch
          8 kB
          Yiqun Lin
        2. YARN-4381.002.patch
          8 kB
          Yiqun Lin
        3. YARN-4381.001.patch
          6 kB
          Yiqun Lin

        Activity

          People

            linyiqun Yiqun Lin
            linyiqun Yiqun Lin
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: