Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6630

Container worker dir could not recover when NM restart

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      When ContainerRetryPolicy is NEVER_RETRY, container worker dir will not be saved in NM state store.

      ContainerLaunch.java
      ...
        private void recordContainerWorkDir(ContainerId containerId,
            String workDir) throws IOException{
          container.setWorkDir(workDir);
          if (container.isRetryContextSet()) {
            context.getNMStateStore().storeContainerWorkDir(containerId, workDir);
          }
        }
      

      Then NM restarts, container.workDir could not recover and is null, and may cause some exceptions.
      We already have a problem, after NM restart, we send a resource localization request while container is running(YARN-1503), then NM will fail because of the following exception.
      So, container.workdir always need to be saved in NM state store.

      ContainerImpl.java
        static class ResourceLocalizedWhileRunningTransition
            extends ContainerTransition {
      ...
                String linkFile = new Path(container.workDir, link).toString();
      ...
      
      java.lang.IllegalArgumentException: Can not create a Path from a null string
              at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159)
              at org.apache.hadoop.fs.Path.<init>(Path.java:175)
              at org.apache.hadoop.fs.Path.<init>(Path.java:110)
      ... ...
      

      Attachments

        1. YARN-6630.003.patch
          16 kB
          Yang Wang
        2. YARN-6630.002.patch
          8 kB
          Yang Wang
        3. YARN-6630.001.patch
          19 kB
          Yang Wang

        Issue Links

          Activity

            People

              wangyang0918 Yang Wang
              wangyang0918 Yang Wang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: