Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3491

PublicLocalizer#addResource is too slow.

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.7.0
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: nodemanager
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Based on the profiling, The bottleneck in PublicLocalizer#addResource is getInitializedLocalDirs. getInitializedLocalDirs call checkLocalDir.
      checkLocalDir is very slow which takes about 10+ ms.
      The total delay will be approximately number of local dirs * 10+ ms.
      This delay will be added for each public resource localization.
      Because PublicLocalizer#addResource is slow, the thread pool can't be fully utilized. Instead of doing public resource localization in parallel(multithreading), public resource localization is serialized most of the time.

      And also PublicLocalizer#addResource is running in Dispatcher thread,
      So the Dispatcher thread will be blocked by PublicLocalizer#addResource for long time.

      1. YARN-3491.000.patch
        2 kB
        zhihai xu
      2. YARN-3491.001.patch
        3 kB
        zhihai xu
      3. YARN-3491.002.patch
        5 kB
        zhihai xu
      4. YARN-3491.003.patch
        15 kB
        zhihai xu
      5. YARN-3491.004.patch
        16 kB
        zhihai xu

        Issue Links

          Activity

          Hide
          jlowe Jason Lowe added a comment -

          Could you elaborate a bit on why the submit is time consuming? Unless I'm mistaken, the FSDownload constructor is very cheap and queueing should be simply tacking an entry on a queue.

          Show
          jlowe Jason Lowe added a comment - Could you elaborate a bit on why the submit is time consuming? Unless I'm mistaken, the FSDownload constructor is very cheap and queueing should be simply tacking an entry on a queue.
          Hide
          zxu zhihai xu added a comment -

          I saw the serialization for public resource localization in the following logs:
          The following log shows two private localization requests and many public localization requests from container_e30_1426628374875_110892_01_000475

          2015-04-07 22:49:56,750 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e30_1426628374875_110892_01_000475 transitioned from NEW to LOCALIZING
          2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/user/databot/.staging/job_1426628374875_110892/job.xml transitioned from INIT to DOWNLOADING
          2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/user/databot/.staging/job_1426628374875_110892/job.jar transitioned from INIT to DOWNLOADING
          2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/tmp/temp1444482237/tmp-1316042064/reflections.jar transitioned from INIT to DOWNLOADING
          2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/tmp/temp1444482237/tmp-327542609/service-media-sdk.jar transitioned from INIT to DOWNLOADING
          2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/tmp/temp1444482237/tmp1631960573/service-local-search-sdk.jar transitioned from INIT to DOWNLOADING
          2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/tmp/temp1444482237/tmp-1521315530/ace-geo.jar transitioned from INIT to DOWNLOADING
          2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/tmp/temp1444482237/tmp1347512155/cortex-server.jar transitioned from INIT to DOWNLOADING
          

          The following log shows how the public resource localizations are processed.

          2015-04-07 22:49:56,758 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_e30_1426628374875_110892_01_000475
          
          2015-04-07 22:49:56,758 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp1444482237/tmp-1316042064/reflections.jar, 1428446867531, FILE, null }
          
          2015-04-07 22:49:56,882 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp1444482237/tmp-327542609/service-media-sdk.jar, 1428446864128, FILE, null }
          
          2015-04-07 22:49:56,902 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/tmp/temp1444482237/tmp-1316042064/reflections.jar(->/data2/yarn/nm/filecache/4877652/reflections.jar) transitioned from DOWNLOADING to LOCALIZED
          
          2015-04-07 22:49:57,127 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp1444482237/tmp1631960573/service-local-search-sdk.jar, 1428446858408, FILE, null }
          
          2015-04-07 22:49:57,145 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/tmp/temp1444482237/tmp-327542609/service-media-sdk.jar(->/data11/yarn/nm/filecache/4877653/service-media-sdk.jar) transitioned from DOWNLOADING to LOCALIZED
          
          2015-04-07 22:49:57,251 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp1444482237/tmp-1521315530/ace-geo.jar, 1428446862857, FILE, null }
          
          2015-04-07 22:49:57,270 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/tmp/temp1444482237/tmp1631960573/service-local-search-sdk.jar(->/data1/yarn/nm/filecache/4877654/service-local-search-sdk.jar) transitioned from DOWNLOADING to LOCALIZED
          
          2015-04-07 22:49:57,383 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp1444482237/tmp1347512155/cortex-server.jar, 1428446857069, FILE, null }
          

          Based on the log, You can see the thread pools are not fully used, only one thread is used. The default thread pool size is 4,
          "Downloading public rsrc" is printed from Dispatcher thread.
          "transitioned from DOWNLOADING to LOCALIZED" is printed from PublicLocalizer thread.
          You can see these two messages are interleaved,
          "Downloading public rsrc"
          "transitioned from DOWNLOADING to LOCALIZED"
          "Downloading public rsrc"
          "transitioned from DOWNLOADING to LOCALIZED"
          "Downloading public rsrc"
          "transitioned from DOWNLOADING to LOCALIZED"

          Also when you compare the time to process the localization event between public resource and private resource in Dispatcher thread,
          there is a huge difference:
          The time to process two localization event for private resource in Dispatcher thread is less than one millisecond.
          based on the following log:

          2015-04-07 22:49:56,758 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_e30_1426628374875_110892_01_000475
          2015-04-07 22:49:56,758 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp1444482237/tmp-1316042064/reflections.jar, 1428446867531, FILE, null }
          

          The time to process one localization event for public resource in Dispatcher thread is 124 millisecond
          based on the following log:The

          2015-04-07 22:49:56,758 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp1444482237/tmp-1316042064/reflections.jar, 1428446867531, FILE, null }
          2015-04-07 22:49:56,882 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp1444482237/tmp-327542609/service-media-sdk.jar, 1428446864128, FILE, null }
          

          The following is the code which process localization event in Dispatcher thread:

              public void handle(LocalizerEvent event) {
                String locId = event.getLocalizerId();
                switch (event.getType()) {
                case REQUEST_RESOURCE_LOCALIZATION:
                  // 0) find running localizer or start new thread
                  LocalizerResourceRequestEvent req =
                    (LocalizerResourceRequestEvent)event;
                  switch (req.getVisibility()) {
                  case PUBLIC:
                    publicLocalizer.addResource(req);
                    break;
                  case PRIVATE:
                  case APPLICATION:
                    synchronized (privLocalizers) {
                      LocalizerRunner localizer = privLocalizers.get(locId);
                      if (null == localizer) {
                        LOG.info("Created localizer for " + locId);
                        localizer = new LocalizerRunner(req.getContext(), locId);
                        privLocalizers.put(locId, localizer);
                        localizer.start();
                      }
                      // 1) propagate event
                      localizer.addResource(req);
                    }
                    break;
                  }
                  break;
                }
              }
          
          Show
          zxu zhihai xu added a comment - I saw the serialization for public resource localization in the following logs: The following log shows two private localization requests and many public localization requests from container_e30_1426628374875_110892_01_000475 2015-04-07 22:49:56,750 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e30_1426628374875_110892_01_000475 transitioned from NEW to LOCALIZING 2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs: //nameservice1/user/databot/.staging/job_1426628374875_110892/job.xml transitioned from INIT to DOWNLOADING 2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs: //nameservice1/user/databot/.staging/job_1426628374875_110892/job.jar transitioned from INIT to DOWNLOADING 2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs: //nameservice1/tmp/temp1444482237/tmp-1316042064/reflections.jar transitioned from INIT to DOWNLOADING 2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs: //nameservice1/tmp/temp1444482237/tmp-327542609/service-media-sdk.jar transitioned from INIT to DOWNLOADING 2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs: //nameservice1/tmp/temp1444482237/tmp1631960573/service-local-search-sdk.jar transitioned from INIT to DOWNLOADING 2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs: //nameservice1/tmp/temp1444482237/tmp-1521315530/ace-geo.jar transitioned from INIT to DOWNLOADING 2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs: //nameservice1/tmp/temp1444482237/tmp1347512155/cortex-server.jar transitioned from INIT to DOWNLOADING The following log shows how the public resource localizations are processed. 2015-04-07 22:49:56,758 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_e30_1426628374875_110892_01_000475 2015-04-07 22:49:56,758 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs: //nameservice1/tmp/temp1444482237/tmp-1316042064/reflections.jar, 1428446867531, FILE, null } 2015-04-07 22:49:56,882 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs: //nameservice1/tmp/temp1444482237/tmp-327542609/service-media-sdk.jar, 1428446864128, FILE, null } 2015-04-07 22:49:56,902 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs: //nameservice1/tmp/temp1444482237/tmp-1316042064/reflections.jar(->/data2/yarn/nm/filecache/4877652/reflections.jar) transitioned from DOWNLOADING to LOCALIZED 2015-04-07 22:49:57,127 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs: //nameservice1/tmp/temp1444482237/tmp1631960573/service-local-search-sdk.jar, 1428446858408, FILE, null } 2015-04-07 22:49:57,145 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs: //nameservice1/tmp/temp1444482237/tmp-327542609/service-media-sdk.jar(->/data11/yarn/nm/filecache/4877653/service-media-sdk.jar) transitioned from DOWNLOADING to LOCALIZED 2015-04-07 22:49:57,251 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs: //nameservice1/tmp/temp1444482237/tmp-1521315530/ace-geo.jar, 1428446862857, FILE, null } 2015-04-07 22:49:57,270 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs: //nameservice1/tmp/temp1444482237/tmp1631960573/service-local-search-sdk.jar(->/data1/yarn/nm/filecache/4877654/service-local-search-sdk.jar) transitioned from DOWNLOADING to LOCALIZED 2015-04-07 22:49:57,383 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs: //nameservice1/tmp/temp1444482237/tmp1347512155/cortex-server.jar, 1428446857069, FILE, null } Based on the log, You can see the thread pools are not fully used, only one thread is used. The default thread pool size is 4, "Downloading public rsrc" is printed from Dispatcher thread. "transitioned from DOWNLOADING to LOCALIZED" is printed from PublicLocalizer thread. You can see these two messages are interleaved, "Downloading public rsrc" "transitioned from DOWNLOADING to LOCALIZED" "Downloading public rsrc" "transitioned from DOWNLOADING to LOCALIZED" "Downloading public rsrc" "transitioned from DOWNLOADING to LOCALIZED" Also when you compare the time to process the localization event between public resource and private resource in Dispatcher thread, there is a huge difference: The time to process two localization event for private resource in Dispatcher thread is less than one millisecond. based on the following log: 2015-04-07 22:49:56,758 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_e30_1426628374875_110892_01_000475 2015-04-07 22:49:56,758 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs: //nameservice1/tmp/temp1444482237/tmp-1316042064/reflections.jar, 1428446867531, FILE, null } The time to process one localization event for public resource in Dispatcher thread is 124 millisecond based on the following log:The 2015-04-07 22:49:56,758 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs: //nameservice1/tmp/temp1444482237/tmp-1316042064/reflections.jar, 1428446867531, FILE, null } 2015-04-07 22:49:56,882 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs: //nameservice1/tmp/temp1444482237/tmp-327542609/service-media-sdk.jar, 1428446864128, FILE, null } The following is the code which process localization event in Dispatcher thread: public void handle(LocalizerEvent event) { String locId = event.getLocalizerId(); switch (event.getType()) { case REQUEST_RESOURCE_LOCALIZATION: // 0) find running localizer or start new thread LocalizerResourceRequestEvent req = (LocalizerResourceRequestEvent)event; switch (req.getVisibility()) { case PUBLIC: publicLocalizer.addResource(req); break ; case PRIVATE: case APPLICATION: synchronized (privLocalizers) { LocalizerRunner localizer = privLocalizers.get(locId); if ( null == localizer) { LOG.info( "Created localizer for " + locId); localizer = new LocalizerRunner(req.getContext(), locId); privLocalizers.put(locId, localizer); localizer.start(); } // 1) propagate event localizer.addResource(req); } break ; } break ; } }
          Hide
          zxu zhihai xu added a comment -

          Hi Jason Lowe, thanks for the comment. Queueing is faster, but It take longer time to add FSDownload to the worker thread.
          If all threads in the thread pool are used, it will be very fast to add an entry to the queue LinkedBlockingQueue#offer.
          Based on the following code in ThreadPoolExecutor#execute, corePoolSize is thread pool size which is 4 in this case.
          workQueue.offer(command) is fast but addWorker is slow. It only queues the task when all threads in the thread pool are running.

             public void execute(Runnable command) {
                  if (command == null)
                      throw new NullPointerException();
                  /*
                   * Proceed in 3 steps:
                   *
                   * 1. If fewer than corePoolSize threads are running, try to
                   * start a new thread with the given command as its first
                   * task.  The call to addWorker atomically checks runState and
                   * workerCount, and so prevents false alarms that would add
                   * threads when it shouldn't, by returning false.
                   *
                   * 2. If a task can be successfully queued, then we still need
                   * to double-check whether we should have added a thread
                   * (because existing ones died since last checking) or that
                   * the pool shut down since entry into this method. So we
                   * recheck state and if necessary roll back the enqueuing if
                   * stopped, or start a new thread if there are none.
                   *
                   * 3. If we cannot queue task, then we try to add a new
                   * thread.  If it fails, we know we are shut down or saturated
                   * and so reject the task.
                   */
                  int c = ctl.get();
                  if (workerCountOf(c) < corePoolSize) {
                      if (addWorker(command, true))
                          return;
                      c = ctl.get();
                  }
                  if (isRunning(c) && workQueue.offer(command)) {
                      int recheck = ctl.get();
                      if (! isRunning(recheck) && remove(command))
                          reject(command);
                      else if (workerCountOf(recheck) == 0)
                          addWorker(null, false);
                  }
                  else if (!addWorker(command, false))
                      reject(command);
              }
          

          The issue is:
          If the time to run one FSDownload(resource localization) is close to the time to run the submit(add FSDownload to the worker thread).
          The oscillation will happen and there will be only one worker thread running. Then Dispatcher thread will be blocked for longer time.
          The above logs can prove this situation. LocalizerRunner#addResource used by private localizer takes less than one millisecond to process one REQUEST_RESOURCE_LOCALIZATION event but PublicLocalizer#addResource used by public localizer takes 124 millisecond to process one REQUEST_RESOURCE_LOCALIZATION event.

          Show
          zxu zhihai xu added a comment - Hi Jason Lowe , thanks for the comment. Queueing is faster, but It take longer time to add FSDownload to the worker thread. If all threads in the thread pool are used, it will be very fast to add an entry to the queue LinkedBlockingQueue#offer. Based on the following code in ThreadPoolExecutor#execute, corePoolSize is thread pool size which is 4 in this case. workQueue.offer(command) is fast but addWorker is slow. It only queues the task when all threads in the thread pool are running. public void execute( Runnable command) { if (command == null ) throw new NullPointerException(); /* * Proceed in 3 steps: * * 1. If fewer than corePoolSize threads are running, try to * start a new thread with the given command as its first * task. The call to addWorker atomically checks runState and * workerCount, and so prevents false alarms that would add * threads when it shouldn't, by returning false . * * 2. If a task can be successfully queued, then we still need * to double -check whether we should have added a thread * (because existing ones died since last checking) or that * the pool shut down since entry into this method. So we * recheck state and if necessary roll back the enqueuing if * stopped, or start a new thread if there are none. * * 3. If we cannot queue task, then we try to add a new * thread. If it fails, we know we are shut down or saturated * and so reject the task. */ int c = ctl.get(); if (workerCountOf(c) < corePoolSize) { if (addWorker(command, true )) return ; c = ctl.get(); } if (isRunning(c) && workQueue.offer(command)) { int recheck = ctl.get(); if (! isRunning(recheck) && remove(command)) reject(command); else if (workerCountOf(recheck) == 0) addWorker( null , false ); } else if (!addWorker(command, false )) reject(command); } The issue is: If the time to run one FSDownload(resource localization) is close to the time to run the submit(add FSDownload to the worker thread). The oscillation will happen and there will be only one worker thread running. Then Dispatcher thread will be blocked for longer time. The above logs can prove this situation. LocalizerRunner#addResource used by private localizer takes less than one millisecond to process one REQUEST_RESOURCE_LOCALIZATION event but PublicLocalizer#addResource used by public localizer takes 124 millisecond to process one REQUEST_RESOURCE_LOCALIZATION event.
          Hide
          sjlee0 Sangjin Lee added a comment -

          I have the same question as Jason Lowe. The actual call

                      synchronized (pending) {
                        pending.put(queue.submit(new FSDownload(lfs, null, conf,
                            publicDirDestPath, resource, request.getContext().getStatCache())),
                            request);
                      }
          

          should be completely non-blocking and there is nothing that's expensive about it with the possible exception of the synchronization. Could you describe the root cause of the slowness you're seeing in some more detail?

          Show
          sjlee0 Sangjin Lee added a comment - I have the same question as Jason Lowe . The actual call synchronized (pending) { pending.put(queue.submit( new FSDownload(lfs, null , conf, publicDirDestPath, resource, request.getContext().getStatCache())), request); } should be completely non-blocking and there is nothing that's expensive about it with the possible exception of the synchronization. Could you describe the root cause of the slowness you're seeing in some more detail?
          Hide
          zxu zhihai xu added a comment -

          Hi Sangjin Lee, that is a good point. I just think about queue.submit is the Bottleneck. Queue.submit is just part of the code in PublicLocalizer#addResource, the Bottleneck may come from publicRsrc.getPathForLocalization, we add a lot of stuff in LocalResourcesTrackerImpl#getPathForLocalization such as {{stateStore.startResourceLocalization(user, appId,
          ((LocalResourcePBImpl) lr).getProto(), localPath); }}

          I should describe it more clearly. Based on the log, the issue is: PublicLocalizer#addResource is very slow, which blocks the Dispatcher thread, I looked at the following code at PublicLocalizer#addResource, I feel queue.submit may take most of CPU cycles, based on Jason Lowe's and your comment, the slowness may come from other code such as publicRsrc.getPathForLocalization or dirsHandler.getLocalPathForWrite. But I think moving all these code in PublicLocalizer#addResource from Dispatcher thread to PublicLocalizer thread should be a good optimization. We can use a synchronizedList of LocalizerResourceRequestEvent to store all these events for public resource localization, which is similar as what LocalizerRunner does for private resource localization.
          I will do some more profiling to see what is Bottleneck in PublicLocalizer#addResource,

              public void addResource(LocalizerResourceRequestEvent request) {
                // TODO handle failures, cancellation, requests by other containers
                LocalizedResource rsrc = request.getResource();
                LocalResourceRequest key = rsrc.getRequest();
                LOG.info("Downloading public rsrc:" + key);
                /*
                 * Here multiple containers may request the same resource. So we need
                 * to start downloading only when
                 * 1) ResourceState == DOWNLOADING
                 * 2) We are able to acquire non blocking semaphore lock.
                 * If not we will skip this resource as either it is getting downloaded
                 * or it FAILED / LOCALIZED.
                 */
          
                if (rsrc.tryAcquire()) {
                  if (rsrc.getState() == ResourceState.DOWNLOADING) {
                    LocalResource resource = request.getResource().getRequest();
                    try {
                      Path publicRootPath =
                          dirsHandler.getLocalPathForWrite("." + Path.SEPARATOR
                              + ContainerLocalizer.FILECACHE,
                            ContainerLocalizer.getEstimatedSize(resource), true);
                      Path publicDirDestPath =
                          publicRsrc.getPathForLocalization(key, publicRootPath);
                      if (!publicDirDestPath.getParent().equals(publicRootPath)) {
                        DiskChecker.checkDir(new File(publicDirDestPath.toUri().getPath()));
                      }
          
                      // In case this is not a newly initialized nm state, ensure
                      // initialized local/log dirs similar to LocalizerRunner
                      getInitializedLocalDirs();
                      getInitializedLogDirs();
          
                      // explicitly synchronize pending here to avoid future task
                      // completing and being dequeued before pending updated
                      synchronized (pending) {
                        pending.put(queue.submit(new FSDownload(lfs, null, conf,
                            publicDirDestPath, resource, request.getContext().getStatCache())),
                            request);
                      }
                    } catch (IOException e) {
                      rsrc.unlock();
                      publicRsrc.handle(new ResourceFailedLocalizationEvent(request
                        .getResource().getRequest(), e.getMessage()));
                      LOG.error("Local path for public localization is not found. "
                          + " May be disks failed.", e);
                    } catch (IllegalArgumentException ie) {
                      rsrc.unlock();
                      publicRsrc.handle(new ResourceFailedLocalizationEvent(request
                          .getResource().getRequest(), ie.getMessage()));
                      LOG.error("Local path for public localization is not found. "
                          + " Incorrect path. " + request.getResource().getRequest()
                          .getPath(), ie);
                    } catch (RejectedExecutionException re) {
                      rsrc.unlock();
                      publicRsrc.handle(new ResourceFailedLocalizationEvent(request
                        .getResource().getRequest(), re.getMessage()));
                      LOG.error("Failed to submit rsrc " + rsrc + " for download."
                          + " Either queue is full or threadpool is shutdown.", re);
                    }
                  } else {
                    rsrc.unlock();
                  }
                }
              }
          
          Show
          zxu zhihai xu added a comment - Hi Sangjin Lee , that is a good point. I just think about queue.submit is the Bottleneck. Queue.submit is just part of the code in PublicLocalizer#addResource, the Bottleneck may come from publicRsrc.getPathForLocalization, we add a lot of stuff in LocalResourcesTrackerImpl#getPathForLocalization such as {{stateStore.startResourceLocalization(user, appId, ((LocalResourcePBImpl) lr).getProto(), localPath); }} I should describe it more clearly. Based on the log, the issue is: PublicLocalizer#addResource is very slow, which blocks the Dispatcher thread, I looked at the following code at PublicLocalizer#addResource, I feel queue.submit may take most of CPU cycles, based on Jason Lowe 's and your comment, the slowness may come from other code such as publicRsrc.getPathForLocalization or dirsHandler.getLocalPathForWrite. But I think moving all these code in PublicLocalizer#addResource from Dispatcher thread to PublicLocalizer thread should be a good optimization. We can use a synchronizedList of LocalizerResourceRequestEvent to store all these events for public resource localization, which is similar as what LocalizerRunner does for private resource localization. I will do some more profiling to see what is Bottleneck in PublicLocalizer#addResource, public void addResource(LocalizerResourceRequestEvent request) { // TODO handle failures, cancellation, requests by other containers LocalizedResource rsrc = request.getResource(); LocalResourceRequest key = rsrc.getRequest(); LOG.info( "Downloading public rsrc:" + key); /* * Here multiple containers may request the same resource. So we need * to start downloading only when * 1) ResourceState == DOWNLOADING * 2) We are able to acquire non blocking semaphore lock. * If not we will skip this resource as either it is getting downloaded * or it FAILED / LOCALIZED. */ if (rsrc.tryAcquire()) { if (rsrc.getState() == ResourceState.DOWNLOADING) { LocalResource resource = request.getResource().getRequest(); try { Path publicRootPath = dirsHandler.getLocalPathForWrite( "." + Path.SEPARATOR + ContainerLocalizer.FILECACHE, ContainerLocalizer.getEstimatedSize(resource), true ); Path publicDirDestPath = publicRsrc.getPathForLocalization(key, publicRootPath); if (!publicDirDestPath.getParent().equals(publicRootPath)) { DiskChecker.checkDir( new File(publicDirDestPath.toUri().getPath())); } // In case this is not a newly initialized nm state, ensure // initialized local/log dirs similar to LocalizerRunner getInitializedLocalDirs(); getInitializedLogDirs(); // explicitly synchronize pending here to avoid future task // completing and being dequeued before pending updated synchronized (pending) { pending.put(queue.submit( new FSDownload(lfs, null , conf, publicDirDestPath, resource, request.getContext().getStatCache())), request); } } catch (IOException e) { rsrc.unlock(); publicRsrc.handle( new ResourceFailedLocalizationEvent(request .getResource().getRequest(), e.getMessage())); LOG.error( "Local path for public localization is not found. " + " May be disks failed." , e); } catch (IllegalArgumentException ie) { rsrc.unlock(); publicRsrc.handle( new ResourceFailedLocalizationEvent(request .getResource().getRequest(), ie.getMessage())); LOG.error( "Local path for public localization is not found. " + " Incorrect path. " + request.getResource().getRequest() .getPath(), ie); } catch (RejectedExecutionException re) { rsrc.unlock(); publicRsrc.handle( new ResourceFailedLocalizationEvent(request .getResource().getRequest(), re.getMessage())); LOG.error( "Failed to submit rsrc " + rsrc + " for download." + " Either queue is full or threadpool is shutdown." , re); } } else { rsrc.unlock(); } } }
          Hide
          zxu zhihai xu added a comment -

          Hi Jason Lowe and Sangjin Lee, I think I know what is bottleneck in PublicLocalizer#addResource.
          I checked the old NM logs from old code in 2.3.0 release. PublicLocalizer#addResource took less than one millisecond in 2.3.0 release .

          2014-10-21 18:11:10,956 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp-1620691366/tmp-602532977/asm.jar, 1413914982330, FILE, null }
          2014-10-21 18:11:10,956 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp-1620691366/tmp-983952127/start.jar, 1413914978818, FILE, null }
          2014-10-21 18:11:10,957 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp-1620691366/tmp-700474448/jsch.jar, 1413914981670, FILE, null }
          2014-10-21 18:11:10,957 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp-1620691366/tmp-295789958/kfs.jar, 1413914974035, FILE, null }
          2014-10-21 18:11:10,957 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp-1620691366/tmp1832142372/datasvc-search.jar, 1413914970738, FILE, null }
          2014-10-21 18:11:10,957 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp-1620691366/tmp-1244404847/args4j.jar, 1413914982044, FILE, null }
          2014-10-21 18:11:10,957 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp-1620691366/tmp729860031/slf4j-log4j12.jar, 1413914980407, FILE, null }
          2014-10-21 18:11:10,957 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp-1620691366/tmp-1748521227/jackson-mapper-asl.jar, 1413914983142, FILE, null }
          2014-10-21 18:11:10,957 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp-1620691366/tmp-246818030/jasper-compiler.jar, 1413914979243, FILE, null }
          2014-10-21 18:11:10,958 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp-1620691366/tmp-1703279108/spiffy.jar, 1413914974080, FILE, null }
          

          Then I compared the public localization code, the difference is at LocalResourcesTrackerImpl#getPathForLocalization:
          The following code is added after 2.3.0 release:

              rPath = new Path(rPath,
                  Long.toString(uniqueNumberGenerator.incrementAndGet()));
              Path localPath = new Path(rPath, req.getPath().getName());
              LocalizedResource rsrc = localrsrc.get(req);
              rsrc.setLocalPath(localPath);
              LocalResource lr = LocalResource.newInstance(req.getResource(),
                  req.getType(), req.getVisibility(), req.getSize(),
                  req.getTimestamp());
              try {
                stateStore.startResourceLocalization(user, appId,
                    ((LocalResourcePBImpl) lr).getProto(), localPath);
              } catch (IOException e) {
                LOG.error("Unable to record localization start for " + rsrc, e);
              }
          

          I think most likely stateStore.startResourceLocalization is the bottleneck.
          startResourceLocalization stored the state in the levelDB. the levelDB operation is time consuming. It need go through the JNI interface.

            public void startResourceLocalization(String user, ApplicationId appId,
                LocalResourceProto proto, Path localPath) throws IOException {
              String key = getResourceStartedKey(user, appId, localPath.toString());
              try {
                db.put(bytes(key), proto.toByteArray());
              } catch (DBException e) {
                throw new IOException(e);
              }
            }
          

          I think it would be better to do these levelDB operations in a separate thread using AsyncDispatcher in NMLeveldbStateStoreService.

          Show
          zxu zhihai xu added a comment - Hi Jason Lowe and Sangjin Lee , I think I know what is bottleneck in PublicLocalizer#addResource. I checked the old NM logs from old code in 2.3.0 release. PublicLocalizer#addResource took less than one millisecond in 2.3.0 release . 2014-10-21 18:11:10,956 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs: //nameservice1/tmp/temp-1620691366/tmp-602532977/asm.jar, 1413914982330, FILE, null } 2014-10-21 18:11:10,956 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs: //nameservice1/tmp/temp-1620691366/tmp-983952127/start.jar, 1413914978818, FILE, null } 2014-10-21 18:11:10,957 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs: //nameservice1/tmp/temp-1620691366/tmp-700474448/jsch.jar, 1413914981670, FILE, null } 2014-10-21 18:11:10,957 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs: //nameservice1/tmp/temp-1620691366/tmp-295789958/kfs.jar, 1413914974035, FILE, null } 2014-10-21 18:11:10,957 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs: //nameservice1/tmp/temp-1620691366/tmp1832142372/datasvc-search.jar, 1413914970738, FILE, null } 2014-10-21 18:11:10,957 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs: //nameservice1/tmp/temp-1620691366/tmp-1244404847/args4j.jar, 1413914982044, FILE, null } 2014-10-21 18:11:10,957 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs: //nameservice1/tmp/temp-1620691366/tmp729860031/slf4j-log4j12.jar, 1413914980407, FILE, null } 2014-10-21 18:11:10,957 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs: //nameservice1/tmp/temp-1620691366/tmp-1748521227/jackson-mapper-asl.jar, 1413914983142, FILE, null } 2014-10-21 18:11:10,957 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs: //nameservice1/tmp/temp-1620691366/tmp-246818030/jasper-compiler.jar, 1413914979243, FILE, null } 2014-10-21 18:11:10,958 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs: //nameservice1/tmp/temp-1620691366/tmp-1703279108/spiffy.jar, 1413914974080, FILE, null } Then I compared the public localization code, the difference is at LocalResourcesTrackerImpl#getPathForLocalization: The following code is added after 2.3.0 release: rPath = new Path(rPath, Long .toString(uniqueNumberGenerator.incrementAndGet())); Path localPath = new Path(rPath, req.getPath().getName()); LocalizedResource rsrc = localrsrc.get(req); rsrc.setLocalPath(localPath); LocalResource lr = LocalResource.newInstance(req.getResource(), req.getType(), req.getVisibility(), req.getSize(), req.getTimestamp()); try { stateStore.startResourceLocalization(user, appId, ((LocalResourcePBImpl) lr).getProto(), localPath); } catch (IOException e) { LOG.error( "Unable to record localization start for " + rsrc, e); } I think most likely stateStore.startResourceLocalization is the bottleneck. startResourceLocalization stored the state in the levelDB. the levelDB operation is time consuming. It need go through the JNI interface. public void startResourceLocalization( String user, ApplicationId appId, LocalResourceProto proto, Path localPath) throws IOException { String key = getResourceStartedKey(user, appId, localPath.toString()); try { db.put(bytes(key), proto.toByteArray()); } catch (DBException e) { throw new IOException(e); } } I think it would be better to do these levelDB operations in a separate thread using AsyncDispatcher in NMLeveldbStateStoreService.
          Hide
          jlowe Jason Lowe added a comment -

          Storing asynchronously is going to be a bit dangerous – we do not want to create a situation where a resource has started localizing but we haven't recorded the fact that we started it. Theoretically we could end up doing a recovery where we leak a resource or fail to realize a localization started but did not complete and we need to clean it up.

          I think it's best at this point to have some hard evidence from a profiler or targeted log statements around the suspected code where all the time is being spent in the NM rather than guessing.

          Show
          jlowe Jason Lowe added a comment - Storing asynchronously is going to be a bit dangerous – we do not want to create a situation where a resource has started localizing but we haven't recorded the fact that we started it. Theoretically we could end up doing a recovery where we leak a resource or fail to realize a localization started but did not complete and we need to clean it up. I think it's best at this point to have some hard evidence from a profiler or targeted log statements around the suspected code where all the time is being spent in the NM rather than guessing.
          Hide
          zxu zhihai xu added a comment -

          Yes, I agree Storing asynchronously is going to be a bit dangerous.
          Yes, I will do more profiling in PublicLocalizer#addResource to get the detail information for the time spending of each sub-code segment.

          Show
          zxu zhihai xu added a comment - Yes, I agree Storing asynchronously is going to be a bit dangerous. Yes, I will do more profiling in PublicLocalizer#addResource to get the detail information for the time spending of each sub-code segment.
          Hide
          zxu zhihai xu added a comment -

          Hi Jason Lowe, You are right, I am really sorry all my previous guesses are wrong.
          I did the profiling and I find out the bottleneck is at the following code

          getInitializedLocalDirs();
          getInitializedLogDirs();
          

          More accurately the bottleneck is at checkLocalDir which call getFileStatus.
          I did two round profiling:
          1.I measure the time in PublicLocalizer#addResource:
          the following code include levelDB operation take 1 ms.

                      Path publicRootPath =
                          dirsHandler.getLocalPathForWrite("." + Path.SEPARATOR
                              + ContainerLocalizer.FILECACHE,
                            ContainerLocalizer.getEstimatedSize(resource), true);
                      Path publicDirDestPath =
                          publicRsrc.getPathForLocalization(key, publicRootPath);
                      if (!publicDirDestPath.getParent().equals(publicRootPath)) {
                        DiskChecker.checkDir(new File(publicDirDestPath.toUri().getPath()));
                      }
          

          getInitializedLocalDirs and getInitializedLogDirs take 12 ms together

          And the following queue.submit code take less than 1 ms.

                      synchronized (pending) {
                        pending.put(queue.submit(new FSDownload(lfs, null, conf,
                            publicDirDestPath, resource, request.getContext().getStatCache())),
                            request);
                      }
          

          2. then I measure the time in getInitializedLocalDirs and getInitializedLogDirs.
          I find out checkLocalDir is really slow which is called by getInitializedLocalDirs.
          checkLocalDir takes 14 ms. There is only one local Dir in my test environment.

            synchronized private List<String> getInitializedLocalDirs() {
              List<String> dirs = dirsHandler.getLocalDirs();
              List<String> checkFailedDirs = new ArrayList<String>();
              for (String dir : dirs) {
                try {
                  checkLocalDir(dir);
                } catch (YarnRuntimeException e) {
                  checkFailedDirs.add(dir);
                }
              }
          

          The log in my previous comment has more than 10 local Dirs, which will call checkLocalDir more than 10 times
          10 * 14 is about 100+ms, So I find out where the 100+ms delay come from.

          I attached a patch YARN-3491.000.patch to fix the issue, The patch will call getInitializedLocalDirs only once for each container.
          The original code will call getInitializedLocalDirs for each public resource. Each container can have hundreds of public resource, which is the situation in my previous log.

          Jason Lowe, Could you review it? thanks

          Show
          zxu zhihai xu added a comment - Hi Jason Lowe , You are right, I am really sorry all my previous guesses are wrong. I did the profiling and I find out the bottleneck is at the following code getInitializedLocalDirs(); getInitializedLogDirs(); More accurately the bottleneck is at checkLocalDir which call getFileStatus. I did two round profiling: 1.I measure the time in PublicLocalizer#addResource: the following code include levelDB operation take 1 ms. Path publicRootPath = dirsHandler.getLocalPathForWrite( "." + Path.SEPARATOR + ContainerLocalizer.FILECACHE, ContainerLocalizer.getEstimatedSize(resource), true ); Path publicDirDestPath = publicRsrc.getPathForLocalization(key, publicRootPath); if (!publicDirDestPath.getParent().equals(publicRootPath)) { DiskChecker.checkDir( new File(publicDirDestPath.toUri().getPath())); } getInitializedLocalDirs and getInitializedLogDirs take 12 ms together And the following queue.submit code take less than 1 ms. synchronized (pending) { pending.put(queue.submit( new FSDownload(lfs, null , conf, publicDirDestPath, resource, request.getContext().getStatCache())), request); } 2. then I measure the time in getInitializedLocalDirs and getInitializedLogDirs. I find out checkLocalDir is really slow which is called by getInitializedLocalDirs. checkLocalDir takes 14 ms. There is only one local Dir in my test environment. synchronized private List< String > getInitializedLocalDirs() { List< String > dirs = dirsHandler.getLocalDirs(); List< String > checkFailedDirs = new ArrayList< String >(); for ( String dir : dirs) { try { checkLocalDir(dir); } catch (YarnRuntimeException e) { checkFailedDirs.add(dir); } } The log in my previous comment has more than 10 local Dirs, which will call checkLocalDir more than 10 times 10 * 14 is about 100+ms, So I find out where the 100+ms delay come from. I attached a patch YARN-3491 .000.patch to fix the issue, The patch will call getInitializedLocalDirs only once for each container. The original code will call getInitializedLocalDirs for each public resource. Each container can have hundreds of public resource, which is the situation in my previous log. Jason Lowe , Could you review it? thanks
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12726118/YARN-3491.000.patch
          against trunk revision 76e7264.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7374//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7374//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726118/YARN-3491.000.patch against trunk revision 76e7264. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7374//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7374//console This message is automatically generated.
          Hide
          zxu zhihai xu added a comment -

          I uploaded a new patch YARN-3491.001.patch for review
          I think a little bit deeper, The old patch may have a big delay if multiple containers are submitted at the same time.
          For example the following log shows 4 containers submitted at very close time:

          2015-04-07 21:42:22,071 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e30_1426628374875_110648_01_078264 transitioned from NEW to LOCALIZING
          2015-04-07 21:42:22,074 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e30_1426628374875_110652_01_093777 transitioned from NEW to LOCALIZING
          2015-04-07 21:42:22,076 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e30_1426628374875_110668_01_049049 transitioned from NEW to LOCALIZING
          2015-04-07 21:42:22,078 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e30_1426628374875_110668_01_085183 transitioned from NEW to LOCALIZING
          

          The new patch can overlap the delay with public localization from previous container, which will be a little bit better and more consistent with the behavior in the old code.
          Also It will be better for the container which only has private resource and no public resource. For this case, no delay will be added to Dispatcher thread.
          Finally the change in new patch is a little bit smaller than the first patch.

          Show
          zxu zhihai xu added a comment - I uploaded a new patch YARN-3491 .001.patch for review I think a little bit deeper, The old patch may have a big delay if multiple containers are submitted at the same time. For example the following log shows 4 containers submitted at very close time: 2015-04-07 21:42:22,071 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e30_1426628374875_110648_01_078264 transitioned from NEW to LOCALIZING 2015-04-07 21:42:22,074 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e30_1426628374875_110652_01_093777 transitioned from NEW to LOCALIZING 2015-04-07 21:42:22,076 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e30_1426628374875_110668_01_049049 transitioned from NEW to LOCALIZING 2015-04-07 21:42:22,078 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e30_1426628374875_110668_01_085183 transitioned from NEW to LOCALIZING The new patch can overlap the delay with public localization from previous container, which will be a little bit better and more consistent with the behavior in the old code. Also It will be better for the container which only has private resource and no public resource. For this case, no delay will be added to Dispatcher thread. Finally the change in new patch is a little bit smaller than the first patch.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12726221/YARN-3491.001.patch
          against trunk revision c6b5203.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          -1 eclipse:eclipse. The patch failed to build with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7383//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7383//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726221/YARN-3491.001.patch against trunk revision c6b5203. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. -1 eclipse:eclipse . The patch failed to build with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7383//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7383//console This message is automatically generated.
          Hide
          zxu zhihai xu added a comment -

          I did more profiling in checkLocalDir. It really surprised me.
          The most time-consuming code is status.getPermission() not lfs.getFileStatus.
          status.getPermission() will take 4 or 5 ms. checkLocalDir will call status.getPermission() three times.
          That is why checkLocalDir take 10+ms.

            private boolean checkLocalDir(String localDir) {
          
              Map<Path, FsPermission> pathPermissionMap = getLocalDirsPathPermissionsMap(localDir);
          
              for (Map.Entry<Path, FsPermission> entry : pathPermissionMap.entrySet()) {
                FileStatus status;
                try {
                  status = lfs.getFileStatus(entry.getKey());
                } catch (Exception e) {
                  String msg =
                      "Could not carry out resource dir checks for " + localDir
                          + ", which was marked as good";
                  LOG.warn(msg, e);
                  throw new YarnRuntimeException(msg, e);
                }
          
                if (!status.getPermission().equals(entry.getValue())) {
                  String msg =
                      "Permissions incorrectly set for dir " + entry.getKey()
                          + ", should be " + entry.getValue() + ", actual value = "
                          + status.getPermission();
                  LOG.warn(msg);
                  throw new YarnRuntimeException(msg);
                }
              }
              return true;
            }
          

          Then I go deeper into the source code I find out why status.getPermission take the most of time:
          lfs.getFileStatus will return RawLocalFileSystem#DeprecatedRawLocalFileStatus,

              public FsPermission getPermission() {
                if (!isPermissionLoaded()) {
                  loadPermissionInfo();
                }
                return super.getPermission();
              }
          

          So status.getPermission will call loadPermissionInfo,
          Based on the following code, loadPermissionInfo is bottle neck, it will call run "ls -ld" to get the permission, which is really slow.

              /// loads permissions, owner, and group from `ls -ld`
              private void loadPermissionInfo() {
                IOException e = null;
                try {
                  String output = FileUtil.execCommand(new File(getPath().toUri()), 
                      Shell.getGetPermissionCommand());
                  StringTokenizer t =
                      new StringTokenizer(output, Shell.TOKEN_SEPARATOR_REGEX);
                  //expected format
                  //-rw-------    1 username groupname ...
                  String permission = t.nextToken();
                  if (permission.length() > FsPermission.MAX_PERMISSION_LENGTH) {
                    //files with ACLs might have a '+'
                    permission = permission.substring(0,
                      FsPermission.MAX_PERMISSION_LENGTH);
                  }
                  setPermission(FsPermission.valueOf(permission));
                  t.nextToken();
          
                  String owner = t.nextToken();
                  // If on windows domain, token format is DOMAIN\\user and we want to
                  // extract only the user name
                  if (Shell.WINDOWS) {
                    int i = owner.indexOf('\\');
                    if (i != -1)
                      owner = owner.substring(i + 1);
                  }
                  setOwner(owner);
          
                  setGroup(t.nextToken());
                } catch (Shell.ExitCodeException ioe) {
                  if (ioe.getExitCode() != 1) {
                    e = ioe;
                  } else {
                    setPermission(null);
                    setOwner(null);
                    setGroup(null);
                  }
                } catch (IOException ioe) {
                  e = ioe;
                } finally {
                  if (e != null) {
                    throw new RuntimeException("Error while running command to get " +
                                               "file permissions : " + 
                                               StringUtils.stringifyException(e));
                  }
                }
              }
          

          We should call getPermission as least as possible in the future

          Show
          zxu zhihai xu added a comment - I did more profiling in checkLocalDir. It really surprised me. The most time-consuming code is status.getPermission() not lfs.getFileStatus. status.getPermission() will take 4 or 5 ms. checkLocalDir will call status.getPermission() three times. That is why checkLocalDir take 10+ms. private boolean checkLocalDir( String localDir) { Map<Path, FsPermission> pathPermissionMap = getLocalDirsPathPermissionsMap(localDir); for (Map.Entry<Path, FsPermission> entry : pathPermissionMap.entrySet()) { FileStatus status; try { status = lfs.getFileStatus(entry.getKey()); } catch (Exception e) { String msg = "Could not carry out resource dir checks for " + localDir + ", which was marked as good" ; LOG.warn(msg, e); throw new YarnRuntimeException(msg, e); } if (!status.getPermission().equals(entry.getValue())) { String msg = "Permissions incorrectly set for dir " + entry.getKey() + ", should be " + entry.getValue() + ", actual value = " + status.getPermission(); LOG.warn(msg); throw new YarnRuntimeException(msg); } } return true ; } Then I go deeper into the source code I find out why status.getPermission take the most of time: lfs.getFileStatus will return RawLocalFileSystem#DeprecatedRawLocalFileStatus, public FsPermission getPermission() { if (!isPermissionLoaded()) { loadPermissionInfo(); } return super .getPermission(); } So status.getPermission will call loadPermissionInfo, Based on the following code, loadPermissionInfo is bottle neck, it will call run "ls -ld" to get the permission, which is really slow. /// loads permissions, owner, and group from `ls -ld` private void loadPermissionInfo() { IOException e = null ; try { String output = FileUtil.execCommand( new File(getPath().toUri()), Shell.getGetPermissionCommand()); StringTokenizer t = new StringTokenizer(output, Shell.TOKEN_SEPARATOR_REGEX); //expected format //-rw------- 1 username groupname ... String permission = t.nextToken(); if (permission.length() > FsPermission.MAX_PERMISSION_LENGTH) { //files with ACLs might have a '+' permission = permission.substring(0, FsPermission.MAX_PERMISSION_LENGTH); } setPermission(FsPermission.valueOf(permission)); t.nextToken(); String owner = t.nextToken(); // If on windows domain, token format is DOMAIN\\user and we want to // extract only the user name if (Shell.WINDOWS) { int i = owner.indexOf('\\'); if (i != -1) owner = owner.substring(i + 1); } setOwner(owner); setGroup(t.nextToken()); } catch (Shell.ExitCodeException ioe) { if (ioe.getExitCode() != 1) { e = ioe; } else { setPermission( null ); setOwner( null ); setGroup( null ); } } catch (IOException ioe) { e = ioe; } finally { if (e != null ) { throw new RuntimeException( "Error while running command to get " + "file permissions : " + StringUtils.stringifyException(e)); } } } We should call getPermission as least as possible in the future
          Hide
          wilfreds Wilfred Spiegelenburg added a comment -

          Could this change not cause a new issue: what happens if a directory goes from bad to good while the localizer is running could that leave us trying to use an uninitialised directory and cause difficult to detect failures until a new localizer is started and the directory is initialised? The getLocalDirs() only returns "good" dirs and thus only the good dirs get initialised.

          Looking over the code there is also a lot of unneeded object creation which could be stripped out speeding things up and lowering memory usage.

          Show
          wilfreds Wilfred Spiegelenburg added a comment - Could this change not cause a new issue: what happens if a directory goes from bad to good while the localizer is running could that leave us trying to use an uninitialised directory and cause difficult to detect failures until a new localizer is started and the directory is initialised? The getLocalDirs() only returns "good" dirs and thus only the good dirs get initialised. Looking over the code there is also a lot of unneeded object creation which could be stripped out speeding things up and lowering memory usage.
          Hide
          zxu zhihai xu added a comment -

          Hi Wilfred Spiegelenburg, thanks for the review. A directory goes from bad to good can happen at any time, which is asynchronous to both public and private resource localization. Even without my change, it can still happen right after initialize local and log Dirs in current code. Also private resource localization initializes local and log Dirs per container not per resource. Our purpose is to make the failure chance less.

          Looking over the code there is also a lot of unneeded object creation which could be stripped out speeding things up and lowering memory usage.

          I did the profiling for PublicLocalizer#addResource, all other code didn't take much time except checkLocalDir which calls getPermission three times. getPermission runs command "ls -ld" to get the permission, which is very slow.

          But your comment gives me some good idea to find a better solution which can save more time:
          We can call LocalDirsHandlerService#getLastDisksCheckTime to get the timestamp of previous disk-check. Using this information we only need initializes local and log Dirs when the timestamp is changed. The timestamp will only be changed every two minutes. It means we won't initialize local and log Dirs more than once in two minutes.

              diskHealthCheckInterval = conf.getLong(
                  YarnConfiguration.NM_DISK_HEALTH_CHECK_INTERVAL_MS,
                  YarnConfiguration.DEFAULT_NM_DISK_HEALTH_CHECK_INTERVAL_MS);
          public static final long DEFAULT_NM_DISK_HEALTH_CHECK_INTERVAL_MS = 120000L;
          

          Hi Jason Lowe, Do you think my new idea is reasonable? I would greatly appreciate it if you kindly give me some feedbacks on my new idea.

          Show
          zxu zhihai xu added a comment - Hi Wilfred Spiegelenburg , thanks for the review. A directory goes from bad to good can happen at any time, which is asynchronous to both public and private resource localization. Even without my change, it can still happen right after initialize local and log Dirs in current code. Also private resource localization initializes local and log Dirs per container not per resource. Our purpose is to make the failure chance less. Looking over the code there is also a lot of unneeded object creation which could be stripped out speeding things up and lowering memory usage. I did the profiling for PublicLocalizer#addResource, all other code didn't take much time except checkLocalDir which calls getPermission three times. getPermission runs command "ls -ld" to get the permission, which is very slow. But your comment gives me some good idea to find a better solution which can save more time: We can call LocalDirsHandlerService#getLastDisksCheckTime to get the timestamp of previous disk-check. Using this information we only need initializes local and log Dirs when the timestamp is changed. The timestamp will only be changed every two minutes. It means we won't initialize local and log Dirs more than once in two minutes. diskHealthCheckInterval = conf.getLong( YarnConfiguration.NM_DISK_HEALTH_CHECK_INTERVAL_MS, YarnConfiguration.DEFAULT_NM_DISK_HEALTH_CHECK_INTERVAL_MS); public static final long DEFAULT_NM_DISK_HEALTH_CHECK_INTERVAL_MS = 120000L; Hi Jason Lowe , Do you think my new idea is reasonable? I would greatly appreciate it if you kindly give me some feedbacks on my new idea.
          Hide
          zxu zhihai xu added a comment -

          I uploaded a new patch YARN-3491.002.patch for review. The new patch will only initialize the local and log Dirs when DisksCheckTime is changed.

          Show
          zxu zhihai xu added a comment - I uploaded a new patch YARN-3491 .002.patch for review. The new patch will only initialize the local and log Dirs when DisksCheckTime is changed.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 34s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 javac 7m 32s There were no new javac warning messages.
          +1 javadoc 9m 34s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 5m 24s The applied patch generated 1 additional checkstyle issues.
          +1 install 1m 35s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 1m 2s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          +1 yarn tests 5m 50s Tests passed in hadoop-yarn-server-nodemanager.
              46m 29s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12728188/YARN-3491.002.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / a00e001
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/7502/artifact/patchprocess/checkstyle-result-diff.txt
          hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7502/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7502/testReport/
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/7502/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 34s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 javac 7m 32s There were no new javac warning messages. +1 javadoc 9m 34s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 5m 24s The applied patch generated 1 additional checkstyle issues. +1 install 1m 35s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 2s The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 yarn tests 5m 50s Tests passed in hadoop-yarn-server-nodemanager.     46m 29s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12728188/YARN-3491.002.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / a00e001 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/7502/artifact/patchprocess/checkstyle-result-diff.txt hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7502/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7502/testReport/ Console output https://builds.apache.org/job/PreCommit-YARN-Build/7502/console This message was automatically generated.
          Hide
          jira.shegalov Gera Shegalov added a comment -

          We should switch to io.nativeio.NativeIO.POSIX#getFstat as implementation in RawLocalFileSystem to get rid of shell-based implementation for FileStatus.

          Show
          jira.shegalov Gera Shegalov added a comment - We should switch to io.nativeio.NativeIO.POSIX#getFstat as implementation in RawLocalFileSystem to get rid of shell-based implementation for FileStatus.
          Hide
          zxu zhihai xu added a comment -

          Hi Gera Shegalov, thanks for the information, Could you give some details about how to switch to io.nativeio.NativeIO.POSIX#getFstat ?
          Currently the attached patch is trying to limit the number of times to call getInitializedLocalDirs , even we switch to {io.nativeio.NativeIO.POSIX#getFstat}} , the attached patch should also be useful. IMHO it will be good to decrease the number of times to call getInitializedLocalDirs and getInitializedLogDirs no matter which API we use.

          Should we create a separate following-up JIRA for switching to io.nativeio.NativeIO.POSIX#getFstat?

          Show
          zxu zhihai xu added a comment - Hi Gera Shegalov , thanks for the information, Could you give some details about how to switch to io.nativeio.NativeIO.POSIX#getFstat ? Currently the attached patch is trying to limit the number of times to call getInitializedLocalDirs , even we switch to {io.nativeio.NativeIO.POSIX#getFstat}} , the attached patch should also be useful. IMHO it will be good to decrease the number of times to call getInitializedLocalDirs and getInitializedLogDirs no matter which API we use. Should we create a separate following-up JIRA for switching to io.nativeio.NativeIO.POSIX#getFstat ?
          Hide
          jira.shegalov Gera Shegalov added a comment -

          Agreed, reducing the number of system calls is a good idea, idea. Using JNI instead of "ls" can be handled with a separate JIRA

          Show
          jira.shegalov Gera Shegalov added a comment - Agreed, reducing the number of system calls is a good idea, idea. Using JNI instead of "ls" can be handled with a separate JIRA
          Hide
          zxu zhihai xu added a comment -

          thanks Gera Shegalov! I created YARN-3549 for switching to {{io.nativeio.NativeIO.POSIX#getFstat}.

          Show
          zxu zhihai xu added a comment - thanks Gera Shegalov ! I created YARN-3549 for switching to {{io.nativeio.NativeIO.POSIX#getFstat}.
          Hide
          zxu zhihai xu added a comment -

          I uploaded a new patch YARN-3491.003 patch for review.
          I think the new patch YARN-3491.003 is better than previous solutions.
          It can solve the race condition "if a directory goes from bad to good." in Wilfred Spiegelenburg's comment.
          I add a small feature in DirectoryCollection.java:DirsChangeListener.
          ResourceLocalizationService can register DirsChangeListener for localDirs and logDirs.
          Once DirectoryCollection#localDirs is changed in checkDirs, ResourceLocalizationService will get callback DirsChangeListener#onDirsChanged, which will call getInitializedLocalDirs and getInitializedLogDirs

          Show
          zxu zhihai xu added a comment - I uploaded a new patch YARN-3491 .003 patch for review. I think the new patch YARN-3491 .003 is better than previous solutions. It can solve the race condition "if a directory goes from bad to good." in Wilfred Spiegelenburg 's comment. I add a small feature in DirectoryCollection.java: DirsChangeListener . ResourceLocalizationService can register DirsChangeListener for localDirs and logDirs. Once DirectoryCollection#localDirs is changed in checkDirs , ResourceLocalizationService will get callback DirsChangeListener#onDirsChanged, which will call getInitializedLocalDirs and getInitializedLogDirs
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 39s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 7m 33s There were no new javac warning messages.
          +1 javadoc 9m 31s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 21s There were no new checkstyle issues.
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 34s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 1m 3s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          +1 yarn tests 5m 51s Tests passed in hadoop-yarn-server-nodemanager.
              41m 32s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12730114/YARN-3491.003.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / a319771
          hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7681/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7681/testReport/
          Java 1.7.0_55
          uname Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/7681/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 39s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 7m 33s There were no new javac warning messages. +1 javadoc 9m 31s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 21s There were no new checkstyle issues. +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 34s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 3s The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 yarn tests 5m 51s Tests passed in hadoop-yarn-server-nodemanager.     41m 32s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12730114/YARN-3491.003.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / a319771 hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7681/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7681/testReport/ Java 1.7.0_55 uname Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/7681/console This message was automatically generated.
          Hide
          wilfreds Wilfred Spiegelenburg added a comment -

          Can we clean up the getInitializedLogDirs() and getInitializedLogDirs() now that we're changing them?
          Neither of the methods need to return anything since we do not use the return value. Also a rename of the methods would make it clearer:
          getInitializedLogDirs() --> initializeLogDirs()
          getInitializedLocalDirs() --> initializeLocalDirs()

          Show
          wilfreds Wilfred Spiegelenburg added a comment - Can we clean up the getInitializedLogDirs() and getInitializedLogDirs() now that we're changing them? Neither of the methods need to return anything since we do not use the return value. Also a rename of the methods would make it clearer: getInitializedLogDirs() --> initializeLogDirs() getInitializedLocalDirs() --> initializeLocalDirs()
          Hide
          zxu zhihai xu added a comment -

          thanks Wilfred Spiegelenburg for the review. I uploaded a new patch YARN-3491.004.patch, which addressed all your comments.

          Show
          zxu zhihai xu added a comment - thanks Wilfred Spiegelenburg for the review. I uploaded a new patch YARN-3491 .004.patch, which addressed all your comments.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 43s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 7m 33s There were no new javac warning messages.
          +1 javadoc 9m 36s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 0m 36s The applied patch generated 3 new checkstyle issues (total was 177, now 178).
          -1 whitespace 0m 1s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 install 1m 33s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 1m 2s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          +1 yarn tests 5m 57s Tests passed in hadoop-yarn-server-nodemanager.
              42m 5s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12730351/YARN-3491.004.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 338e88a
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/7700/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt
          whitespace https://builds.apache.org/job/PreCommit-YARN-Build/7700/artifact/patchprocess/whitespace.txt
          hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7700/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7700/testReport/
          Java 1.7.0_55
          uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/7700/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 43s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 7m 33s There were no new javac warning messages. +1 javadoc 9m 36s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 36s The applied patch generated 3 new checkstyle issues (total was 177, now 178). -1 whitespace 0m 1s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 33s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 2s The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 yarn tests 5m 57s Tests passed in hadoop-yarn-server-nodemanager.     42m 5s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12730351/YARN-3491.004.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 338e88a checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/7700/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt whitespace https://builds.apache.org/job/PreCommit-YARN-Build/7700/artifact/patchprocess/whitespace.txt hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7700/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7700/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/7700/console This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 40s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 7m 32s There were no new javac warning messages.
          +1 javadoc 9m 37s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 36s There were no new checkstyle issues.
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 32s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 1m 3s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          +1 yarn tests 5m 51s Tests passed in hadoop-yarn-server-nodemanager.
              41m 52s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12730368/YARN-3491.004.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 338e88a
          hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7701/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7701/testReport/
          Java 1.7.0_55
          uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/7701/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 40s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 7m 32s There were no new javac warning messages. +1 javadoc 9m 37s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 36s There were no new checkstyle issues. +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 32s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 3s The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 yarn tests 5m 51s Tests passed in hadoop-yarn-server-nodemanager.     41m 52s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12730368/YARN-3491.004.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 338e88a hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7701/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7701/testReport/ Java 1.7.0_55 uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/7701/console This message was automatically generated.
          Hide
          rkanter Robert Kanter added a comment -

          LGTM +1. I'll hold off on committing just yet in case anyone else has more comments first.

          Show
          rkanter Robert Kanter added a comment - LGTM +1. I'll hold off on committing just yet in case anyone else has more comments first.
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 41s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 7m 34s There were no new javac warning messages.
          +1 javadoc 9m 34s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 37s There were no new checkstyle issues.
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 34s mvn install still works.
          +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse.
          +1 findbugs 1m 3s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          +1 yarn tests 5m 51s Tests passed in hadoop-yarn-server-nodemanager.
              41m 54s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12730710/YARN-3491.004.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / a583a40
          hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7727/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7727/testReport/
          Java 1.7.0_55
          uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/7727/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 41s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 7m 34s There were no new javac warning messages. +1 javadoc 9m 34s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 37s There were no new checkstyle issues. +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 34s mvn install still works. +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse. +1 findbugs 1m 3s The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 yarn tests 5m 51s Tests passed in hadoop-yarn-server-nodemanager.     41m 54s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12730710/YARN-3491.004.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / a583a40 hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7727/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7727/testReport/ Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/7727/console This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 53s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 50s There were no new javac warning messages.
          +1 javadoc 10m 3s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 49s There were no new checkstyle issues.
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 35s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 1m 17s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          -1 yarn tests 49m 44s Tests failed in hadoop-yarn-server-resourcemanager.
              87m 16s  



          Reason Tests
          Failed unit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestRMRestart



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12730708/YARN-3385.004.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 90b3845
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7725/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7725/testReport/
          Java 1.7.0_55
          uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/7725/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 53s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 50s There were no new javac warning messages. +1 javadoc 10m 3s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 49s There were no new checkstyle issues. +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 35s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 17s The patch does not introduce any new Findbugs (version 2.0.3) warnings. -1 yarn tests 49m 44s Tests failed in hadoop-yarn-server-resourcemanager.     87m 16s   Reason Tests Failed unit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestRMRestart Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12730708/YARN-3385.004.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 90b3845 hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7725/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7725/testReport/ Java 1.7.0_55 uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/7725/console This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 43s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 7m 36s There were no new javac warning messages.
          +1 javadoc 9m 38s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 36s There were no new checkstyle issues.
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 34s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 1m 2s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          -1 yarn tests 5m 48s Tests failed in hadoop-yarn-server-nodemanager.
              41m 58s  



          Reason Tests
          Failed unit tests hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12730730/YARN-3491.004.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / a583a40
          hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7729/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7729/testReport/
          Java 1.7.0_55
          uname Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/7729/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 43s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 7m 36s There were no new javac warning messages. +1 javadoc 9m 38s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 36s There were no new checkstyle issues. +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 34s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 2s The patch does not introduce any new Findbugs (version 2.0.3) warnings. -1 yarn tests 5m 48s Tests failed in hadoop-yarn-server-nodemanager.     41m 58s   Reason Tests Failed unit tests hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12730730/YARN-3491.004.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / a583a40 hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7729/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7729/testReport/ Java 1.7.0_55 uname Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/7729/console This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 43s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 7m 31s There were no new javac warning messages.
          +1 javadoc 9m 33s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 37s There were no new checkstyle issues.
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 35s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 1m 3s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          +1 yarn tests 6m 1s Tests passed in hadoop-yarn-server-nodemanager.
              42m 10s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12730772/YARN-3491.004.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / a583a40
          hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7730/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7730/testReport/
          Java 1.7.0_55
          uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/7730/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 43s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 7m 31s There were no new javac warning messages. +1 javadoc 9m 33s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 37s There were no new checkstyle issues. +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 35s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 3s The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 yarn tests 6m 1s Tests passed in hadoop-yarn-server-nodemanager.     42m 10s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12730772/YARN-3491.004.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / a583a40 hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7730/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7730/testReport/ Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/7730/console This message was automatically generated.
          Hide
          adhoot Anubhav Dhoot added a comment -

          LGTM

          Show
          adhoot Anubhav Dhoot added a comment - LGTM
          Hide
          rkanter Robert Kanter added a comment -

          Thanks Zhihai. Committed to trunk and branch-2!

          Show
          rkanter Robert Kanter added a comment - Thanks Zhihai. Committed to trunk and branch-2!
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #7750 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7750/)
          YARN-3491. PublicLocalizer#addResource is too slow. (zxu via rkanter) (rkanter: rev b72507810aece08e17ab4b5aae1f7eae1fe98609)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #7750 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7750/ ) YARN-3491 . PublicLocalizer#addResource is too slow. (zxu via rkanter) (rkanter: rev b72507810aece08e17ab4b5aae1f7eae1fe98609) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java hadoop-yarn-project/CHANGES.txt
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          zhihai xu, interesting JIRA and great profiling!

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - zhihai xu , interesting JIRA and great profiling!
          Hide
          zxu zhihai xu added a comment -

          thanks Jason Lowe, Sangjin Lee and Gera Shegalov for the valuable suggestions.
          thanks Wilfred Spiegelenburg and Anubhav Dhoot for reviewing the patch.
          thanks Robert Kanter for the review and committing the patch.
          thanks Vinod Kumar Vavilapalli for your feedback!

          Show
          zxu zhihai xu added a comment - thanks Jason Lowe , Sangjin Lee and Gera Shegalov for the valuable suggestions. thanks Wilfred Spiegelenburg and Anubhav Dhoot for reviewing the patch. thanks Robert Kanter for the review and committing the patch. thanks Vinod Kumar Vavilapalli for your feedback!
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #187 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/187/)
          YARN-3491. PublicLocalizer#addResource is too slow. (zxu via rkanter) (rkanter: rev b72507810aece08e17ab4b5aae1f7eae1fe98609)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #187 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/187/ ) YARN-3491 . PublicLocalizer#addResource is too slow. (zxu via rkanter) (rkanter: rev b72507810aece08e17ab4b5aae1f7eae1fe98609) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #920 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/920/)
          YARN-3491. PublicLocalizer#addResource is too slow. (zxu via rkanter) (rkanter: rev b72507810aece08e17ab4b5aae1f7eae1fe98609)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #920 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/920/ ) YARN-3491 . PublicLocalizer#addResource is too slow. (zxu via rkanter) (rkanter: rev b72507810aece08e17ab4b5aae1f7eae1fe98609) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2118 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2118/)
          YARN-3491. PublicLocalizer#addResource is too slow. (zxu via rkanter) (rkanter: rev b72507810aece08e17ab4b5aae1f7eae1fe98609)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2118 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2118/ ) YARN-3491 . PublicLocalizer#addResource is too slow. (zxu via rkanter) (rkanter: rev b72507810aece08e17ab4b5aae1f7eae1fe98609) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #177 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/177/)
          YARN-3491. PublicLocalizer#addResource is too slow. (zxu via rkanter) (rkanter: rev b72507810aece08e17ab4b5aae1f7eae1fe98609)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #177 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/177/ ) YARN-3491 . PublicLocalizer#addResource is too slow. (zxu via rkanter) (rkanter: rev b72507810aece08e17ab4b5aae1f7eae1fe98609) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #187 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/187/)
          YARN-3491. PublicLocalizer#addResource is too slow. (zxu via rkanter) (rkanter: rev b72507810aece08e17ab4b5aae1f7eae1fe98609)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #187 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/187/ ) YARN-3491 . PublicLocalizer#addResource is too slow. (zxu via rkanter) (rkanter: rev b72507810aece08e17ab4b5aae1f7eae1fe98609) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2136 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2136/)
          YARN-3491. PublicLocalizer#addResource is too slow. (zxu via rkanter) (rkanter: rev b72507810aece08e17ab4b5aae1f7eae1fe98609)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2136 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2136/ ) YARN-3491 . PublicLocalizer#addResource is too slow. (zxu via rkanter) (rkanter: rev b72507810aece08e17ab4b5aae1f7eae1fe98609) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java
          Hide
          brahmareddy Brahma Reddy Battula added a comment -

          I feel, this should go in branch-2.7 as well..?

          Show
          brahmareddy Brahma Reddy Battula added a comment - I feel, this should go in branch-2.7 as well..?

            People

            • Assignee:
              zxu zhihai xu
              Reporter:
              zxu zhihai xu
            • Votes:
              0 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development