Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4444

nodemanager fails to start when one of the local-dirs is bad

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.23.3, 2.0.0-alpha, 3.0.0
    • Fix Version/s: 0.23.3, 2.0.2-alpha
    • Component/s: nodemanager
    • Labels:
      None

      Activity

      Hide
      Hudson added a comment -

      Integrated in Hadoop-Mapreduce-trunk #1154 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1154/)
      MAPREDUCE-4444. nodemanager fails to start when one of the local-dirs is bad (Jason Lowe via bobby) (Revision 1367783)

      Result = FAILURE
      bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1367783
      Files :

      • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
      • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
      • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java
      Show
      Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1154 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1154/ ) MAPREDUCE-4444 . nodemanager fails to start when one of the local-dirs is bad (Jason Lowe via bobby) (Revision 1367783) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1367783 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java
      Hide
      Hudson added a comment -

      Integrated in Hadoop-Hdfs-trunk #1122 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1122/)
      MAPREDUCE-4444. nodemanager fails to start when one of the local-dirs is bad (Jason Lowe via bobby) (Revision 1367783)

      Result = FAILURE
      bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1367783
      Files :

      • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
      • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
      • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java
      Show
      Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1122 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1122/ ) MAPREDUCE-4444 . nodemanager fails to start when one of the local-dirs is bad (Jason Lowe via bobby) (Revision 1367783) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1367783 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java
      Hide
      Hudson added a comment -

      Integrated in Hadoop-Hdfs-0.23-Build #331 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/331/)
      svn merge -c 1367783 FIXES: MAPREDUCE-4444. nodemanager fails to start when one of the local-dirs is bad (Jason Lowe via bobby) (Revision 1367785)

      Result = SUCCESS
      bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1367785
      Files :

      • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
      • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
      • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java
      Show
      Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #331 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/331/ ) svn merge -c 1367783 FIXES: MAPREDUCE-4444 . nodemanager fails to start when one of the local-dirs is bad (Jason Lowe via bobby) (Revision 1367785) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1367785 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java
      Hide
      Hudson added a comment -

      Integrated in Hadoop-Mapreduce-trunk-Commit #2562 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2562/)
      MAPREDUCE-4444. nodemanager fails to start when one of the local-dirs is bad (Jason Lowe via bobby) (Revision 1367783)

      Result = FAILURE
      bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1367783
      Files :

      • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
      • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
      • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java
      Show
      Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2562 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2562/ ) MAPREDUCE-4444 . nodemanager fails to start when one of the local-dirs is bad (Jason Lowe via bobby) (Revision 1367783) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1367783 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java
      Hide
      Hudson added a comment -

      Integrated in Hadoop-Hdfs-trunk-Commit #2608 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2608/)
      MAPREDUCE-4444. nodemanager fails to start when one of the local-dirs is bad (Jason Lowe via bobby) (Revision 1367783)

      Result = SUCCESS
      bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1367783
      Files :

      • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
      • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
      • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java
      Show
      Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2608 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2608/ ) MAPREDUCE-4444 . nodemanager fails to start when one of the local-dirs is bad (Jason Lowe via bobby) (Revision 1367783) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1367783 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java
      Hide
      Hudson added a comment -

      Integrated in Hadoop-Common-trunk-Commit #2543 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2543/)
      MAPREDUCE-4444. nodemanager fails to start when one of the local-dirs is bad (Jason Lowe via bobby) (Revision 1367783)

      Result = SUCCESS
      bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1367783
      Files :

      • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
      • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
      • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java
      Show
      Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2543 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2543/ ) MAPREDUCE-4444 . nodemanager fails to start when one of the local-dirs is bad (Jason Lowe via bobby) (Revision 1367783) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1367783 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java
      Hide
      Robert Joseph Evans added a comment -

      Thanks Jason, +1 for the change, I put this into trunk, branch-2, and branch-0.23

      Show
      Robert Joseph Evans added a comment - Thanks Jason, +1 for the change, I put this into trunk, branch-2, and branch-0.23
      Hide
      Hadoop QA added a comment -

      +1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12538396/MAPREDUCE-4444.patch
      against trunk revision .

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 1 new or modified test files.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 eclipse:eclipse. The patch built with eclipse:eclipse.

      +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2676//testReport/
      Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2676//console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538396/MAPREDUCE-4444.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2676//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2676//console This message is automatically generated.
      Hide
      Jason Lowe added a comment -

      Patch that changes LocalDirsHandlerService to check for bad directories during init so they're removed from the list of directories before subsequent init code tries to access them.

      Show
      Jason Lowe added a comment - Patch that changes LocalDirsHandlerService to check for bad directories during init so they're removed from the list of directories before subsequent init code tries to access them.
      Hide
      Nathan Roberts added a comment -

      disk_fail_in_place should allow a volume to fail and for the nodemanager to continue to function. It does seem to obey yarn.nodemanager.disk-health-checker.min-healthy-disks while it's up, but after a disk has failed, it no longer starts.

      [main]2012-07-11 20:58:19,857 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
      [main]org.apache.hadoop.yarn.YarnException: Failed to initialize LocalizationService
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.init(ResourceLocalizationService.java:202)
      at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.init(ContainerManagerImpl.java:183)
      at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
      at org.apache.hadoop.yarn.server.nodemanager.NodeManager.init(NodeManager.java:159)
      at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:260)
      at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:276)
      Caused by: EROFS: Read-only file system
      at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
      at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:562)
      at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:369)
      at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:888)
      at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
      at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
      at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:700) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:697)
      at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2319) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:697)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.init(ResourceLocalizationService.java:188) ... 6 more

      Show
      Nathan Roberts added a comment - disk_fail_in_place should allow a volume to fail and for the nodemanager to continue to function. It does seem to obey yarn.nodemanager.disk-health-checker.min-healthy-disks while it's up, but after a disk has failed, it no longer starts. [main] 2012-07-11 20:58:19,857 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager [main] org.apache.hadoop.yarn.YarnException: Failed to initialize LocalizationService at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.init(ResourceLocalizationService.java:202) at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.init(ContainerManagerImpl.java:183) at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.init(NodeManager.java:159) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:260) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:276) Caused by: EROFS: Read-only file system at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:562) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:369) at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:888) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:700) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:697) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2319) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.init(ResourceLocalizationService.java:188) ... 6 more

        People

        • Assignee:
          Jason Lowe
          Reporter:
          Nathan Roberts
        • Votes:
          0 Vote for this issue
          Watchers:
          11 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development