Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-326

Add a lifecycle interface for Hadoop components: namenodes, job clients, etc.

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I'd like to propose we have a standard interface for hadoop components, the things that get started or stopped when you bring up a namenode. currently, some of these classes have a stop() or shutdown() method, with no standard name/interface, but no way of seeing if they are live, checking their health of shutting them down reliably. Indeed, there is a tendency for the spawned threads to not want to die; to require the entire process to be killed to stop the workers.

      Having a standard interface would make it easier for

      • management tools to manage the different things
      • monitoring the state of things
      • subclassing

      The latter is interesting as right now TaskTracker and JobTracker start up threads in their constructor; that's very dangerous as subclasses may have their methods called before they are full initialised. Adding this interface would be the right time to clean up the startup process so that subclassing is less risky.

      1. HDFS-326.patch
        17 kB
        Steve Loughran
      2. hadoop-lifecycle-tomw.sxw
        144 kB
        steve_l
      3. hadoop-lifecycle.sxw
        143 kB
        steve_l
      4. hadoop-lifecycle.pdf
        316 kB
        steve_l
      5. hadoop-lifecycle.pdf
        321 kB
        steve_l
      6. HADOOP-3628-19.patch
        347 kB
        steve_l
      7. HADOOP-3628-18.patch
        126 kB
        steve_l
      8. hadoop-3628.patch
        49 kB
        steve_l
      9. hadoop-3628.patch
        55 kB
        steve_l
      10. hadoop-3628.patch
        70 kB
        steve_l
      11. hadoop-3628.patch
        55 kB
        steve_l
      12. hadoop-3628.patch
        96 kB
        steve_l
      13. hadoop-3628.patch
        121 kB
        steve_l
      14. hadoop-3628.patch
        147 kB
        steve_l
      15. hadoop-3628.patch
        152 kB
        steve_l
      16. hadoop-3628.patch
        139 kB
        steve_l
      17. hadoop-3628.patch
        143 kB
        steve_l
      18. hadoop-3628.patch
        142 kB
        steve_l
      19. hadoop-3628.patch
        154 kB
        steve_l
      20. hadoop-3628.patch
        165 kB
        steve_l
      21. hadoop-3628.patch
        164 kB
        steve_l
      22. hadoop-3628.patch
        164 kB
        steve_l
      23. hadoop-3628.patch
        167 kB
        steve_l
      24. hadoop-3628.patch
        162 kB
        steve_l
      25. AbstractHadoopComponent.java
        3 kB
        steve_l

        Issue Links

          Activity

          Hide
          steve_l added a comment -

          The interface I've been using in my prototype. Init is for intialisation (and subclassing), start() starts all worker threads; ping() implements internal health. terminate() shuts things down. Everything throws exceptions except terminate(), that is required to do best effort shutdown (catching, logging and continuing)

          public interface HadoopComponentLifecycle {

          /**

          • Initialize; read in and validate values.
          • @throws IOException for any initialisation failure
            */
            public void init() throws IOException;

          /**

          • Start any work (in separate threads)
            *
          • @throws IOException for any initialisation failure
            */
            public void start() throws IOException;

          /**

          • Ping: only valid when started.
          • @throws IOException for any ping failure
            */
            void ping() throws IOException;

          /**

          • Shut down. This must be idempotent and turn errors into log/warn events -do your best to clean up
          • even in the face of adversity.
            */
            void terminate();

          /**

          • Get the current state
          • @return the lifecycle state
            */
            State getLifecycleState();

          /**

          • The lifecycle state. Failure is a wierd one as it often takes a side effecting test (or an oursider) to observe
            */
            public enum State { CREATED, INITIALIZED, STARTED, FAILED, TERMINATED }

            }

          Show
          steve_l added a comment - The interface I've been using in my prototype. Init is for intialisation (and subclassing), start() starts all worker threads; ping() implements internal health. terminate() shuts things down. Everything throws exceptions except terminate(), that is required to do best effort shutdown (catching, logging and continuing) public interface HadoopComponentLifecycle { /** Initialize; read in and validate values. @throws IOException for any initialisation failure */ public void init() throws IOException; /** Start any work (in separate threads) * @throws IOException for any initialisation failure */ public void start() throws IOException; /** Ping: only valid when started. @throws IOException for any ping failure */ void ping() throws IOException; /** Shut down. This must be idempotent and turn errors into log/warn events -do your best to clean up even in the face of adversity. */ void terminate(); /** Get the current state @return the lifecycle state */ State getLifecycleState(); /** The lifecycle state. Failure is a wierd one as it often takes a side effecting test (or an oursider) to observe */ public enum State { CREATED, INITIALIZED, STARTED, FAILED, TERMINATED } }
          Hide
          Doug Cutting added a comment -

          +1 to the concept. However I suggest using an abstract class instead of an interface as it will make it possible to evolve the interface without breaking existing implementations.

          Show
          Doug Cutting added a comment - +1 to the concept. However I suggest using an abstract class instead of an interface as it will make it possible to evolve the interface without breaking existing implementations.
          Hide
          steve_l added a comment -

          OK, an abstract class like the one attached is easily done.

          Some thoughts.

          1. should the terminate() operation return a list of throwables that were caught during termination? That way, rather than just log problems on shutdown, whatever initiates the shutdown can deal with them by logging/rethrowing, etc.

          2. Is the set of states right? do we want a STARTING state that components enter until they consider themselves live?

          3. The class is a base class of Configurable. I plan to move the code that reads in the configuration from the constructors into init(); that way subclasses can do tricks such as manipulate the configuration before it is read.

          4. Should the base methods be synchronized? The alternative is to leave that to the subclasses, as appropriate. In our experience, having the terminate() option non-synchronized is good to avoid deadlocks; for the others, synchronized is generally good.

          5. The base class can verify the system is in the right state, throwing exceptions if not.

          6. I'll add a HadoopException that extends IOException too.

          7. is org.apache.hadoop.conf the right package?

          Test-wise, we can add tests that check this, with a mock object that can be configured to fail in any of the method calls, so as to stress any container

          Show
          steve_l added a comment - OK, an abstract class like the one attached is easily done. Some thoughts. 1. should the terminate() operation return a list of throwables that were caught during termination? That way, rather than just log problems on shutdown, whatever initiates the shutdown can deal with them by logging/rethrowing, etc. 2. Is the set of states right? do we want a STARTING state that components enter until they consider themselves live? 3. The class is a base class of Configurable. I plan to move the code that reads in the configuration from the constructors into init(); that way subclasses can do tricks such as manipulate the configuration before it is read. 4. Should the base methods be synchronized? The alternative is to leave that to the subclasses, as appropriate. In our experience, having the terminate() option non-synchronized is good to avoid deadlocks; for the others, synchronized is generally good. 5. The base class can verify the system is in the right state, throwing exceptions if not. 6. I'll add a HadoopException that extends IOException too. 7. is org.apache.hadoop.conf the right package? Test-wise, we can add tests that check this, with a mock object that can be configured to fail in any of the method calls, so as to stress any container
          Hide
          Amar Kamat added a comment -

          I would go for STARTING, INITIALIZING, RUNNING, READY, FAILED, TERMINATED. Descriptions follow
          STARTING : The object is created but not initialized.
          INITIALIZING : Getting initialized
          RUNNING : Its up but not yet ready. This could be the case where some internal operations can start but no external requests should be handled.
          READY : The ultimate state
          FAILED/TERMINATED : failed/terminated
          The distinction between READY-INTERNAL(RUNNING) and READY-EXTERNAL is important for HADOOP-3245.

          Show
          Amar Kamat added a comment - I would go for STARTING, INITIALIZING, RUNNING, READY, FAILED, TERMINATED . Descriptions follow STARTING : The object is created but not initialized. INITIALIZING : Getting initialized RUNNING : Its up but not yet ready. This could be the case where some internal operations can start but no external requests should be handled. READY : The ultimate state FAILED/TERMINATED : failed/terminated The distinction between READY-INTERNAL(RUNNING) and READY-EXTERNAL is important for HADOOP-3245 .
          Hide
          steve_l added a comment -

          The running/ready differentiation looks good. What is important is to have a set of states that can be easily aggregated; if we are bringing up a big system, its READY when its inner parts are READY; if the namenode is only RUNNING, then the cluster itself isnt live.

          I've been thinking that org.apache.hadoop.lifecycle would be better, as then I would have a space to add stuff alongside this.

          Show
          steve_l added a comment - The running/ready differentiation looks good. What is important is to have a set of states that can be easily aggregated; if we are bringing up a big system, its READY when its inner parts are READY; if the namenode is only RUNNING, then the cluster itself isnt live. I've been thinking that org.apache.hadoop.lifecycle would be better, as then I would have a space to add stuff alongside this.
          Hide
          Doug Cutting added a comment -

          This does not belong in the conf package, however I'm not yet convinced it needs its own package. Do we expect a lot of support classes to be added, or that this will mostly be standalone? If standalone, then util might suffice.

          My intuition is to call this something like Daemon or Service, not LifeCycle. Are there uses which don't fit those names?

          Show
          Doug Cutting added a comment - This does not belong in the conf package, however I'm not yet convinced it needs its own package. Do we expect a lot of support classes to be added, or that this will mostly be standalone? If standalone, then util might suffice. My intuition is to call this something like Daemon or Service, not LifeCycle. Are there uses which don't fit those names?
          Hide
          steve_l added a comment -

          Daemon is taken in utils, with a thread that has setDaemon.true; Service would work.

          I'm running the tests on a prototype implementation; namenode and datanode have been reparented by Service; let's see what goes wrong. What worries me most is termination(); in my own tests of the existing code I've found that some things dont shut down very well and ports stay open until the process is killed...

          Show
          steve_l added a comment - Daemon is taken in utils, with a thread that has setDaemon.true; Service would work. I'm running the tests on a prototype implementation; namenode and datanode have been reparented by Service; let's see what goes wrong. What worries me most is termination(); in my own tests of the existing code I've found that some things dont shut down very well and ports stay open until the process is killed...
          Hide
          steve_l added a comment -

          This patch is creates an abstract base class Service (in util) with a lifecycle, retrofits to NameNode, DataNode, JobTracker and makes the changes to MiniDFS needed to get all the tests to work (at least here). It is up to date with movements of the dfs code. which is why I'm submitting today and not last friday.

          -it also adds a standalone test that MiniDFS works. MiniDFS could be enhanced to share the same base class lifecycle; this could be done while improving its startup/shutdown times, which is the main reason the test cycle for hadoop is so long.

          -it doesn't make any major attempts to fix problems with startup/shutdown of services that have been noted in filed bugreps, except that service termination is now synchronized. Once the patch is in other lifecycle issues with the nodes can be looked at.

          Show
          steve_l added a comment - This patch is creates an abstract base class Service (in util) with a lifecycle, retrofits to NameNode, DataNode, JobTracker and makes the changes to MiniDFS needed to get all the tests to work (at least here). It is up to date with movements of the dfs code. which is why I'm submitting today and not last friday. -it also adds a standalone test that MiniDFS works. MiniDFS could be enhanced to share the same base class lifecycle; this could be done while improving its startup/shutdown times, which is the main reason the test cycle for hadoop is so long. -it doesn't make any major attempts to fix problems with startup/shutdown of services that have been noted in filed bugreps, except that service termination is now synchronized. Once the patch is in other lifecycle issues with the nodes can be looked at.
          Hide
          steve_l added a comment -

          Patch submitted for review/comment. Everything is designed to be backwards compatible, except that nodes dont start in their constructor. The factory methods all do a complete deployment, but it means that anyone subclassing the (package scoped) classes will have found their code having to change. I think I'm the only person who has done this, and with a lifecycle in Hadoop itself, my subclassing is no longer needed.

          Show
          steve_l added a comment - Patch submitted for review/comment. Everything is designed to be backwards compatible, except that nodes dont start in their constructor. The factory methods all do a complete deployment, but it means that anyone subclassing the (package scoped) classes will have found their code having to change. I think I'm the only person who has done this, and with a lifecycle in Hadoop itself, my subclassing is no longer needed.
          Hide
          Doug Cutting added a comment -

          This is looking good to me.

          > retrofits to NameNode, DataNode, JobTracker

          Should we convert TaskTracker and SecondaryNameNode too?

          Show
          Doug Cutting added a comment - This is looking good to me. > retrofits to NameNode, DataNode, JobTracker Should we convert TaskTracker and SecondaryNameNode too?
          Hide
          steve_l added a comment -

          yes, we should have a base for all of them. But each one can be migrated individually; it would benefit me if we could get the basic structure in and tracking the source as it moves around, then

          • bring the other services in
          • work on clean shutdowns
          • track down the reasons for minidfs startup/shutdown pauses

          feature creep

          • consider adding a standard notification mechanism for state changes
          Show
          steve_l added a comment - yes, we should have a base for all of them. But each one can be migrated individually; it would benefit me if we could get the basic structure in and tracking the source as it moves around, then bring the other services in work on clean shutdowns track down the reasons for minidfs startup/shutdown pauses feature creep consider adding a standard notification mechanism for state changes
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12385401/hadoop-3628.patch
          against trunk revision 674671.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2809/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12385401/hadoop-3628.patch against trunk revision 674671. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2809/console This message is automatically generated.
          Hide
          steve_l added a comment -

          updated patch that resolves (whitespace) incompatibilities between SVN HEAD and the previous patch.

          Show
          steve_l added a comment - updated patch that resolves (whitespace) incompatibilities between SVN HEAD and the previous patch.
          Hide
          steve_l added a comment -

          resubmitting

          Show
          steve_l added a comment - resubmitting
          Hide
          steve_l added a comment -

          Looking at TaskTracker, can also gain this lifecycle. The big complication is it has an inner bit that is started/stopped with initialize()/close(), within a bigger lifecycle that manages the web service; this is only terminated with shutdown().

          So TaskTracker would need adapting as follows

          -the start() operation starts everything up
          -the terminate() operation is mapped to TaskTracker.shutdown()
          -ping checks for the presence of the web server only.
          -initialize() and close() do not change service state, they just reset a running service.
          -if you try and initialize() a service that is not yet started, it is started.

          This is a bit fiddly, but would keep the basic idea: you can close() and initialize() to reset most but not all of the service.

          Show
          steve_l added a comment - Looking at TaskTracker, can also gain this lifecycle. The big complication is it has an inner bit that is started/stopped with initialize()/close(), within a bigger lifecycle that manages the web service; this is only terminated with shutdown(). So TaskTracker would need adapting as follows -the start() operation starts everything up -the terminate() operation is mapped to TaskTracker.shutdown() -ping checks for the presence of the web server only. -initialize() and close() do not change service state, they just reset a running service. -if you try and initialize() a service that is not yet started, it is started. This is a bit fiddly, but would keep the basic idea: you can close() and initialize() to reset most but not all of the service.
          Hide
          steve_l added a comment -

          on the subject of trackers, is there any reason why JobTracker.stopTracker doesnt close its filesystem; doesn't call fs.close()?

          If you find the usages of stopTracker() , all but one of them call jt.fs.close() first, which means they assume that jt.fs != null, and that closing doesn't cause problems. The only case where fs.close() isnt invokes is in JobTracker.addHostToNodeMapping(), and the method is unusual in that it is logging at FATAL and calls System.exit() if it cannot stop the tracker.

          I propose always closing the filesystem when the tracker is stopped (if needed), and removing those lines in the code where it is externally stopped.

          Show
          steve_l added a comment - on the subject of trackers, is there any reason why JobTracker.stopTracker doesnt close its filesystem; doesn't call fs.close()? If you find the usages of stopTracker() , all but one of them call jt.fs.close() first, which means they assume that jt.fs != null, and that closing doesn't cause problems. The only case where fs.close() isnt invokes is in JobTracker.addHostToNodeMapping(), and the method is unusual in that it is logging at FATAL and calls System.exit() if it cannot stop the tracker. I propose always closing the filesystem when the tracker is stopped (if needed), and removing those lines in the code where it is externally stopped.
          Hide
          steve_l added a comment -

          this updated patch is cleaner to subclass; I've been adapting what I had in terms of SmartFrog support to deal with the new service base class.

          innerInit, innerStart and innerTerminate methods for services to implement to do their work, protected methods that will only get called if a state change is actually required (i.e. thread safe state checking before entering/invoking).

          -a method onStateChange(ServiceState oldState, ServiceState newState) that is invoked when a state change occurs, for subclasses to monitor/log. the base class just logs the event at the info level.

          There is no taskTracker support in there, that one is trickier. All the hadoop tests appear to work; the SF tests are failing with in-process namenode not shutting down , which I believe is a problem in my codebase, not this patch.

          Show
          steve_l added a comment - this updated patch is cleaner to subclass; I've been adapting what I had in terms of SmartFrog support to deal with the new service base class. innerInit, innerStart and innerTerminate methods for services to implement to do their work, protected methods that will only get called if a state change is actually required (i.e. thread safe state checking before entering/invoking). -a method onStateChange(ServiceState oldState, ServiceState newState) that is invoked when a state change occurs, for subclasses to monitor/log. the base class just logs the event at the info level. There is no taskTracker support in there, that one is trickier. All the hadoop tests appear to work; the SF tests are failing with in-process namenode not shutting down , which I believe is a problem in my codebase, not this patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12385519/hadoop-3628.patch
          against trunk revision 675078.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 522 javac compiler warnings (more than the trunk's current 520 warnings).

          -1 findbugs. The patch appears to introduce 2 new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2819/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2819/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2819/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2819/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12385519/hadoop-3628.patch against trunk revision 675078. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 522 javac compiler warnings (more than the trunk's current 520 warnings). -1 findbugs. The patch appears to introduce 2 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2819/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2819/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2819/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2819/console This message is automatically generated.
          Hide
          steve_l added a comment -

          This is an updated patch whose namenode.toString() works even when not initialized, so the logging of state changes doesnt NPE. It does not include TaskTracker changes.

          Show
          steve_l added a comment - This is an updated patch whose namenode.toString() works even when not initialized, so the logging of state changes doesnt NPE. It does not include TaskTracker changes.
          Hide
          steve_l added a comment -

          patch without tasktracker

          Show
          steve_l added a comment - patch without tasktracker
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12385643/hadoop-3628.patch
          against trunk revision 676069.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 12 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 522 javac compiler warnings (more than the trunk's current 520 warnings).

          -1 findbugs. The patch appears to introduce 2 new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2833/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2833/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2833/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2833/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12385643/hadoop-3628.patch against trunk revision 676069. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 522 javac compiler warnings (more than the trunk's current 520 warnings). -1 findbugs. The patch appears to introduce 2 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2833/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2833/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2833/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2833/console This message is automatically generated.
          Hide
          steve_l added a comment -

          3 tests have failed.

          #1 testJobTrackerPorts NPEs on job tracker termnation, when the notifier (singleton) is terminated and set to null. The fact that a singleton is an something I'd been hoping to deal with separately. Either it needs to be dealt with first, or the jobTracker terminate code made more robust.

          org.apache.hadoop.mapred.TestMRServerPorts.testJobTrackerPorts
          Failing for the past 1 build (Since Failed#2833 )
          Took 1 second.

          java.lang.NullPointerException
          at org.apache.hadoop.mapred.JobEndNotifier.stopNotifier(JobEndNotifier.java:92)
          at org.apache.hadoop.mapred.JobTracker.innerTerminate(JobTracker.java:812)
          at org.apache.hadoop.util.Service.terminate(Service.java:134)
          at org.apache.hadoop.util.Service.deploy(Service.java:266)
          at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:150)
          at org.apache.hadoop.mapred.TestMRServerPorts.canStartJobTracker(TestMRServerPorts.java:66)
          at org.apache.hadoop.mapred.TestMRServerPorts.testJobTrackerPorts(TestMRServerPorts.java:107)

          #2, and #3 are almost the same stack trace; the filesystem is closed when the client expects it to be open. This doesnt show up on the work box, so it could be a race condition that is surfacing here.

          org.apache.hadoop.mapred.TestRackAwareTaskPlacement.testTaskPlacement
          Failing for the past 1 build (Since Failed#2833 )
          Took 41 seconds.

          java.io.IOException: Filesystem closed
          at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:201)
          at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:569)
          at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:383)
          at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:663)
          at org.apache.hadoop.mapred.TestRackAwareTaskPlacement.launchJobAndTestCounters(TestRackAwareTaskPlacement.java:77)
          at org.apache.hadoop.mapred.TestRackAwareTaskPlacement.testTaskPlacement(TestRackAwareTaskPlacement.java:155)

          org.apache.hadoop.mapred.TestMultipleLevelCaching.testMultiLevelCaching
          java.io.IOException: Filesystem closed
          at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:201)
          at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:532)
          at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:185)
          at org.apache.hadoop.mapred.TestMultipleLevelCaching.testCachingAtLevel(TestMultipleLevelCaching.java:117)
          at org.apache.hadoop.mapred.TestMultipleLevelCaching.testMultiLevelCaching(TestMultipleLevelCaching.java:69)

          expect updated patches in the week of july 25.

          Show
          steve_l added a comment - 3 tests have failed. #1 testJobTrackerPorts NPEs on job tracker termnation, when the notifier (singleton) is terminated and set to null. The fact that a singleton is an something I'd been hoping to deal with separately. Either it needs to be dealt with first, or the jobTracker terminate code made more robust. org.apache.hadoop.mapred.TestMRServerPorts.testJobTrackerPorts Failing for the past 1 build (Since Failed#2833 ) Took 1 second. java.lang.NullPointerException at org.apache.hadoop.mapred.JobEndNotifier.stopNotifier(JobEndNotifier.java:92) at org.apache.hadoop.mapred.JobTracker.innerTerminate(JobTracker.java:812) at org.apache.hadoop.util.Service.terminate(Service.java:134) at org.apache.hadoop.util.Service.deploy(Service.java:266) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:150) at org.apache.hadoop.mapred.TestMRServerPorts.canStartJobTracker(TestMRServerPorts.java:66) at org.apache.hadoop.mapred.TestMRServerPorts.testJobTrackerPorts(TestMRServerPorts.java:107) #2, and #3 are almost the same stack trace; the filesystem is closed when the client expects it to be open. This doesnt show up on the work box, so it could be a race condition that is surfacing here. org.apache.hadoop.mapred.TestRackAwareTaskPlacement.testTaskPlacement Failing for the past 1 build (Since Failed#2833 ) Took 41 seconds. java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:201) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:569) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:383) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:663) at org.apache.hadoop.mapred.TestRackAwareTaskPlacement.launchJobAndTestCounters(TestRackAwareTaskPlacement.java:77) at org.apache.hadoop.mapred.TestRackAwareTaskPlacement.testTaskPlacement(TestRackAwareTaskPlacement.java:155) org.apache.hadoop.mapred.TestMultipleLevelCaching.testMultiLevelCaching java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:201) at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:532) at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:185) at org.apache.hadoop.mapred.TestMultipleLevelCaching.testCachingAtLevel(TestMultipleLevelCaching.java:117) at org.apache.hadoop.mapred.TestMultipleLevelCaching.testMultiLevelCaching(TestMultipleLevelCaching.java:69) expect updated patches in the week of july 25.
          Hide
          steve_l added a comment -

          This patch is

          • updated to SVN HEAD, including the scheduling changes to JobTracker
          • incorporates a fix to HADOOP-3842, the race condition in job tracker scheduler startup
          • brings the TaskTracker and MiniMRCluster into the lifecycle (though MiniDFSCluster and MiniMR Cluster are not yet defined as Service classes themselves)
          • tunes some of the tests to shut down more elegantly

          Here is the list of files in the diff
          src/core/org/apache/hadoop/util/Service.java
          src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          src/hdfs/org/apache/hadoop/hdfs/server/namenode/NameNode.java
          src/mapred/org/apache/hadoop/mapred/JobEndNotifier.java
          src/mapred/org/apache/hadoop/mapred/JobTracker.java
          src/mapred/org/apache/hadoop/mapred/TaskTracker.java
          src/test/org/apache/hadoop/fs/TestCopyFiles.java
          src/test/org/apache/hadoop/fs/TestFileSystem.java
          src/test/org/apache/hadoop/hdfs/MiniDFSCluster.java
          src/test/org/apache/hadoop/hdfs/TestMiniDFSCluster.java
          src/test/org/apache/hadoop/mapred/MiniMRCluster.java
          src/test/org/apache/hadoop/mapred/TestMRServerPorts.java
          src/test/org/apache/hadoop/mapred/TestMultipleLevelCaching.java
          src/test/org/apache/hadoop/mapred/TestRackAwareTaskPlacement.java
          src/test/org/apache/hadoop/mapred/TestTaskTrackerCleanupQueue.java

          Show
          steve_l added a comment - This patch is updated to SVN HEAD, including the scheduling changes to JobTracker incorporates a fix to HADOOP-3842 , the race condition in job tracker scheduler startup brings the TaskTracker and MiniMRCluster into the lifecycle (though MiniDFSCluster and MiniMR Cluster are not yet defined as Service classes themselves) tunes some of the tests to shut down more elegantly Here is the list of files in the diff src/core/org/apache/hadoop/util/Service.java src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java src/hdfs/org/apache/hadoop/hdfs/server/namenode/NameNode.java src/mapred/org/apache/hadoop/mapred/JobEndNotifier.java src/mapred/org/apache/hadoop/mapred/JobTracker.java src/mapred/org/apache/hadoop/mapred/TaskTracker.java src/test/org/apache/hadoop/fs/TestCopyFiles.java src/test/org/apache/hadoop/fs/TestFileSystem.java src/test/org/apache/hadoop/hdfs/MiniDFSCluster.java src/test/org/apache/hadoop/hdfs/TestMiniDFSCluster.java src/test/org/apache/hadoop/mapred/MiniMRCluster.java src/test/org/apache/hadoop/mapred/TestMRServerPorts.java src/test/org/apache/hadoop/mapred/TestMultipleLevelCaching.java src/test/org/apache/hadoop/mapred/TestRackAwareTaskPlacement.java src/test/org/apache/hadoop/mapred/TestTaskTrackerCleanupQueue.java
          Hide
          steve_l added a comment -

          All tests are working on this machine other than TestULimit, which I am not sure is related -let's see what hudson has to say.

          Show
          steve_l added a comment - All tests are working on this machine other than TestULimit, which I am not sure is related -let's see what hudson has to say.
          Hide
          steve_l added a comment -

          Incidentally, right now the lifecycle has, as suggested, a RUNNING and a READY state. I'm already finding myself confused by these. Maybe they should become STARTING and LIVE, which make it clearer when a service is just starting up and when it considers itself live. I think this would be good, and now is the time to make such a change, but what do others think?

          Show
          steve_l added a comment - Incidentally, right now the lifecycle has, as suggested, a RUNNING and a READY state. I'm already finding myself confused by these. Maybe they should become STARTING and LIVE, which make it clearer when a service is just starting up and when it considers itself live. I think this would be good, and now is the time to make such a change, but what do others think?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12387094/hadoop-3628.patch
          against trunk revision 680577.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 33 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 3 new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2971/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2971/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2971/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2971/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12387094/hadoop-3628.patch against trunk revision 680577. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 33 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 3 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2971/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2971/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2971/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2971/console This message is automatically generated.
          Hide
          Tom White added a comment -

          Maybe they should become STARTING and LIVE

          I would suggest STARTED (to match CREATED, INITIALIZED, FAILED, TERMINATED) and LIVE.

          Show
          Tom White added a comment - Maybe they should become STARTING and LIVE I would suggest STARTED (to match CREATED, INITIALIZED, FAILED, TERMINATED) and LIVE.
          Hide
          steve_l added a comment -

          fixing findbugs and adding more tests

          Show
          steve_l added a comment - fixing findbugs and adding more tests
          Hide
          steve_l added a comment -

          This updated patch
          -should address the (over-cautious) findbugs warnings
          -includes the task tracker in the service lifecycle
          -adds a mock service and test class to go alongside, verifying that the expected lifecycle is followed.
          -uses STARTED and LIVE as its states to transit through when starting.

          Based on the mock service tests, I've tweaked the ping() operation to

          • ignore ping() requests when starting up
          • call an innerPing() method when ping() is called when LIVE; this is for subclasses to add health checks
          • automatically go into FAILED state when innerPing() throws an exception.
            So: if your liveness test fails, that service is deemed to have failed.

          I'm using this downstream, it works fairly well.

          Let's see what hudson makes of it.

          Show
          steve_l added a comment - This updated patch -should address the (over-cautious) findbugs warnings -includes the task tracker in the service lifecycle -adds a mock service and test class to go alongside, verifying that the expected lifecycle is followed. -uses STARTED and LIVE as its states to transit through when starting. Based on the mock service tests, I've tweaked the ping() operation to ignore ping() requests when starting up call an innerPing() method when ping() is called when LIVE; this is for subclasses to add health checks automatically go into FAILED state when innerPing() throws an exception. So: if your liveness test fails, that service is deemed to have failed. I'm using this downstream, it works fairly well. Let's see what hudson makes of it.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12387213/hadoop-3628.patch
          against trunk revision 680823.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 40 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          -1 javac. The applied patch generated 525 javac compiler warnings (more than the trunk's current 524 warnings).

          -1 findbugs. The patch appears to introduce 3 new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2984/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2984/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2984/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2984/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12387213/hadoop-3628.patch against trunk revision 680823. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 40 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. -1 javac. The applied patch generated 525 javac compiler warnings (more than the trunk's current 524 warnings). -1 findbugs. The patch appears to introduce 3 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2984/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2984/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2984/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2984/console This message is automatically generated.
          Hide
          steve_l added a comment -

          the last patch is oversynchronized; blocking JobTracker in heartbeat.

          [junit] – Blocked trying to get lock: org/apache/hadoop/mapred/JobTracker@0x1bcf418[thin lock]
          [junit] at jrockit/vm/Threads.sleep(I)V(Native Method)
          [junit] at jrockit/vm/Locks.waitForThinRelease(Locks.java:1233)
          [junit] at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1307)
          [junit] at jrockit/vm/Locks.monitorEnter(Locks.java:2389)
          [junit] at org/apache/hadoop/mapred/JobTracker.heartbeat(JobTracker.java:1262)
          [junit] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
          [junit] at jrockit/vm/Reflect.invokeMethod(Ljava/lang/Object;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Native Method)
          [junit] at sun/reflect/NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Native Method)

          is blocked by

          [junit] – Waiting for notification on: org/apache/hadoop/ipc/RPC$Server@0x128b4e8[fat lock]
          [junit] at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
          [junit] at java/lang/Object.wait(J)V(Native Method)
          [junit] at java/lang/Object.wait(Object.java:485)
          [junit] at org/apache/hadoop/ipc/Server.join(Server.java:1016)
          [junit] ^-- Lock released while waiting: org/apache/hadoop/ipc/RPC$Server@0x128b4e8[fat lock]
          [junit] at org/apache/hadoop/mapred/JobTracker.offerService(JobTracker.java:800)
          [junit] ^-- Holding lock: org/apache/hadoop/mapred/JobTracker@0x1bcf418[thin lock]
          [junit] at org/apache/hadoop/mapred/MiniMRCluster$JobTrackerRunner.run(MiniMRCluster.java:89)

          Show
          steve_l added a comment - the last patch is oversynchronized; blocking JobTracker in heartbeat. [junit] – Blocked trying to get lock: org/apache/hadoop/mapred/JobTracker@0x1bcf418 [thin lock] [junit] at jrockit/vm/Threads.sleep(I)V(Native Method) [junit] at jrockit/vm/Locks.waitForThinRelease(Locks.java:1233) [junit] at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1307) [junit] at jrockit/vm/Locks.monitorEnter(Locks.java:2389) [junit] at org/apache/hadoop/mapred/JobTracker.heartbeat(JobTracker.java:1262) [junit] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method) [junit] at jrockit/vm/Reflect.invokeMethod(Ljava/lang/Object;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Native Method) [junit] at sun/reflect/NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Native Method) is blocked by [junit] – Waiting for notification on: org/apache/hadoop/ipc/RPC$Server@0x128b4e8 [fat lock] [junit] at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method) [junit] at java/lang/Object.wait(J)V(Native Method) [junit] at java/lang/Object.wait(Object.java:485) [junit] at org/apache/hadoop/ipc/Server.join(Server.java:1016) [junit] ^-- Lock released while waiting: org/apache/hadoop/ipc/RPC$Server@0x128b4e8 [fat lock] [junit] at org/apache/hadoop/mapred/JobTracker.offerService(JobTracker.java:800) [junit] ^-- Holding lock: org/apache/hadoop/mapred/JobTracker@0x1bcf418 [thin lock] [junit] at org/apache/hadoop/mapred/MiniMRCluster$JobTrackerRunner.run(MiniMRCluster.java:89)
          Hide
          Chris Douglas added a comment -

          Should ping() catch Throwable from innerPing? Throwing an IOE with whatever came out of innerPing as its cause would widen the range of errors it can report, most of which are probably not I/O related.

          Show
          Chris Douglas added a comment - Should ping() catch Throwable from innerPing? Throwing an IOE with whatever came out of innerPing as its cause would widen the range of errors it can report, most of which are probably not I/O related.
          Hide
          steve_l added a comment -

          Good idea..given lots of Hadoop code throws RuntimeException derivatives, this would catch them and present them to the caller.

          Show
          steve_l added a comment - Good idea..given lots of Hadoop code throws RuntimeException derivatives, this would catch them and present them to the caller.
          Hide
          steve_l added a comment -

          This is an updated patch which appears to work on my machine, but a couple of tests fail intermittently -and never when run standalone. Let's see what hudson makes of it

          Changes from previous patches

          • the innerPing() operation catches any throwable and reports it as a ping failure
          • the new classes have javadocs
          • tests that use MiniDFSCluster and MiniMRCluster have been edited to shut these clusters down reliably; this makes for a lot of the changes in the test classes.
          Show
          steve_l added a comment - This is an updated patch which appears to work on my machine, but a couple of tests fail intermittently -and never when run standalone. Let's see what hudson makes of it Changes from previous patches the innerPing() operation catches any throwable and reports it as a ping failure the new classes have javadocs tests that use MiniDFSCluster and MiniMRCluster have been edited to shut these clusters down reliably; this makes for a lot of the changes in the test classes.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12387480/hadoop-3628.patch
          against trunk revision 682978.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 52 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3012/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12387480/hadoop-3628.patch against trunk revision 682978. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 52 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3012/console This message is automatically generated.
          Hide
          steve_l added a comment -

          now that hudson is live, the latest patch is out of sync.

          Show
          steve_l added a comment - now that hudson is live, the latest patch is out of sync.
          Hide
          Chris Douglas added a comment -

          Any updates on this?

          Show
          Chris Douglas added a comment - Any updates on this?
          Hide
          steve_l added a comment -

          I am still working on this (albeit with a break for some travel), but I've temporarily stopped syncing with SVN_HEAD while I try and get my local tests working before the hadoop uk event; some of the lessons there are interesting.

          One test of mine that is failing is some filesystem operations on a cluster that has just been brought up; the problem here is that the namenode considers itself live when its filesystem is happy, but for FS operations, the HDFS as a whole isn't live until >1 namenode is up. Its not enough to block until the namenode state==LIVE and ping() is good; I have to probe the #of datanodes and block until it is adequate. I've been doing that in my own codebase; retrofitting a method to my subclassed namenode and jobtracker to get their worker count and then exporting that via RMI.

          I think it would make sense to add a common interface to all services that have workers that we could use to probe for this number, and we ought to think about making both it and the ping() operation available over RPC, so that other tools can check the cluster status

          Show
          steve_l added a comment - I am still working on this (albeit with a break for some travel), but I've temporarily stopped syncing with SVN_HEAD while I try and get my local tests working before the hadoop uk event; some of the lessons there are interesting. One test of mine that is failing is some filesystem operations on a cluster that has just been brought up; the problem here is that the namenode considers itself live when its filesystem is happy, but for FS operations, the HDFS as a whole isn't live until >1 namenode is up. Its not enough to block until the namenode state==LIVE and ping() is good; I have to probe the #of datanodes and block until it is adequate. I've been doing that in my own codebase; retrofitting a method to my subclassed namenode and jobtracker to get their worker count and then exporting that via RMI. I think it would make sense to add a common interface to all services that have workers that we could use to probe for this number, and we ought to think about making both it and the ping() operation available over RPC, so that other tools can check the cluster status
          Hide
          Pete Wyckoff added a comment -

          Hi Steve,

          I was looking at adding a set of base RPC services to the NameNode via thrift for exactly this reason. This would of course include the ability to add REST API, but also well defined APIs in perl, python, java etc. It will be fairly easy by adding thrifts base services contrib class fb303 which exposes things like ping, aliveSince, .... I was actually waiting for this patch to try adding the APIs after this.

          Show
          Pete Wyckoff added a comment - Hi Steve, I was looking at adding a set of base RPC services to the NameNode via thrift for exactly this reason. This would of course include the ability to add REST API, but also well defined APIs in perl, python, java etc. It will be fairly easy by adding thrifts base services contrib class fb303 which exposes things like ping, aliveSince, .... I was actually waiting for this patch to try adding the APIs after this.
          Hide
          steve_l added a comment -

          Pete. I can stick up what I have but since Johan's patch, I'm not in sync with SVN_HEAD..I wont start merging until wednesday.

          How about you create a new task, "add remote ping/health RPC methods to all services" that depends on this. I like the aliveSince attribute too, as that lets you recognise when a service restarted...

          Show
          steve_l added a comment - Pete. I can stick up what I have but since Johan's patch, I'm not in sync with SVN_HEAD..I wont start merging until wednesday. How about you create a new task, "add remote ping/health RPC methods to all services" that depends on this. I like the aliveSince attribute too, as that lets you recognise when a service restarted...
          Hide
          steve_l added a comment -

          Here is the latest patch; it is not in sync with the recent changes to Datanode, so dont try and patch anything that compiles. I will start merging again from next wednesday.

          One thing I'm just getting working is managing datanode lifecycles.

          1. is a datanode considered live only once registered? If so, we could declare it as live only after register() succeeds.

          2. similarly, should the ping() require dnRegistration to be non null?

          Show
          steve_l added a comment - Here is the latest patch; it is not in sync with the recent changes to Datanode, so dont try and patch anything that compiles. I will start merging again from next wednesday. One thing I'm just getting working is managing datanode lifecycles. 1. is a datanode considered live only once registered? If so, we could declare it as live only after register() succeeds. 2. similarly, should the ping() require dnRegistration to be non null?
          Hide
          Konstantin Shvachko added a comment -

          I do not see any difference in behavior of ping in case (UNDEFINED: CREATED: INITIALIZED: STARTED) and LIVE:
          In both cases ping will return nothing and the caller will not be able to know whether the service is live or not.
          Do I understand the purpose of ping() correctly?
          In case of FAILED: TERMINATED: ping() throws ServiceStateException, and this is only way to know that the service is not live.
          But is this the right way to pass information outside? Exceptions are for errors right?
          I think ping() is somewhat redundant because Service already has getServiceState(). This is as close to ping as it could be, imo.

          > 1. is a datanode considered live only once registered?

          May be it needs to be clarified. Data-node indeed is not alive until it registers for the first time. It later can be asked to re-register
          when name-node restarts, but it still should be considered alive when the name-node is down and therefore formally the data-node
          is not registered with anything out there.

          > 2. similarly, should the ping() require dnRegistration to be non null?

          After registration dnRegistration is not null, so ping can suppose that. If ping() is really necessary, see above.

          Show
          Konstantin Shvachko added a comment - I do not see any difference in behavior of ping in case (UNDEFINED: CREATED: INITIALIZED: STARTED) and LIVE: In both cases ping will return nothing and the caller will not be able to know whether the service is live or not. Do I understand the purpose of ping() correctly? In case of FAILED: TERMINATED: ping() throws ServiceStateException, and this is only way to know that the service is not live. But is this the right way to pass information outside? Exceptions are for errors right? I think ping() is somewhat redundant because Service already has getServiceState(). This is as close to ping as it could be, imo. > 1. is a datanode considered live only once registered? May be it needs to be clarified. Data-node indeed is not alive until it registers for the first time. It later can be asked to re-register when name-node restarts, but it still should be considered alive when the name-node is down and therefore formally the data-node is not registered with anything out there. > 2. similarly, should the ping() require dnRegistration to be non null? After registration dnRegistration is not null, so ping can suppose that. If ping() is really necessary, see above.
          Hide
          steve_l added a comment -

          This is the patch synchronized with SVN_HEAD; running local tests to see if everything still works

          Show
          steve_l added a comment - This is the patch synchronized with SVN_HEAD; running local tests to see if everything still works
          Hide
          steve_l added a comment -

          in sync with SVN_HEAD

          Show
          steve_l added a comment - in sync with SVN_HEAD
          Hide
          steve_l added a comment -

          Konstantin, thank you for your comments.

          I've just posted some slides where I talk about my experience of using the current prototype, where I'd been blocking my filesystem operations until the namenode and datanodes went live.
          http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf

          I want to delay work until the cluster itself is live, which is slighly harder to determine than aggregating the liveness of every node; an HDFS cluster is "live" if enough datanodes are connected to the namenode, where "enough" is a matter of personal preference. If there is a namenode outage then all the datanodes are still live, but the cluster itself is down.

          How do we represent these states to the caller?

          One option: fail the ping() with explicit exceptions for NoManager and NoWorkers. The Job and Task Trackers could throw the same exceptions when they were down. Callers would then be able to look out for these exceptions and recognise that these are probably transient states and not overreact, or react by checking the health of other parts of the cluster.

          An alternate option suggested by Doug Cutting is to have the nodes switch from LIVE to STARTED when they arent connected to the other parts of the system. More formally
          A Namenode is only only live when it has >'enough' worker. When the number of datanodes drops below this value, the name node reverts to the STARTED state until the count goes above a predefined minimum value.
          A Datanode is only live when it is connected to a name node. When it deregisters or loses contact, it goes back to STARTED until it successfully registers.
          The job/task trackers would do the same.

          This approach would still aggregate well. A full M/R cluster would revert to STARTED if the namenode considered itself in the STARTED state, or if the JobTracker was in that state.

          1. I can prototype this if people think it is good.
          2. it would seem to make sense to have the ping() operation return the current state, so you'd easily differentiate the started versus live ping(), the latter being stricter about system health.
          3. There will be some fun tests here, but good ones, as we get to verify the cluster copes with with transient node outages. They may be slow, though.

          Show
          steve_l added a comment - Konstantin, thank you for your comments. I've just posted some slides where I talk about my experience of using the current prototype, where I'd been blocking my filesystem operations until the namenode and datanodes went live. http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf I want to delay work until the cluster itself is live, which is slighly harder to determine than aggregating the liveness of every node; an HDFS cluster is "live" if enough datanodes are connected to the namenode, where "enough" is a matter of personal preference. If there is a namenode outage then all the datanodes are still live, but the cluster itself is down. How do we represent these states to the caller? One option: fail the ping() with explicit exceptions for NoManager and NoWorkers. The Job and Task Trackers could throw the same exceptions when they were down. Callers would then be able to look out for these exceptions and recognise that these are probably transient states and not overreact, or react by checking the health of other parts of the cluster. An alternate option suggested by Doug Cutting is to have the nodes switch from LIVE to STARTED when they arent connected to the other parts of the system. More formally A Namenode is only only live when it has >'enough' worker. When the number of datanodes drops below this value, the name node reverts to the STARTED state until the count goes above a predefined minimum value. A Datanode is only live when it is connected to a name node. When it deregisters or loses contact, it goes back to STARTED until it successfully registers. The job/task trackers would do the same. This approach would still aggregate well. A full M/R cluster would revert to STARTED if the namenode considered itself in the STARTED state, or if the JobTracker was in that state. 1. I can prototype this if people think it is good. 2. it would seem to make sense to have the ping() operation return the current state, so you'd easily differentiate the started versus live ping(), the latter being stricter about system health. 3. There will be some fun tests here, but good ones, as we get to verify the cluster copes with with transient node outages. They may be slow, though.
          Hide
          Konstantin Shvachko added a comment -

          Steve, thanks for the slides. Lots of interesting stuff.

          > HDFS cluster is "live" if enough datanodes are connected to the namenode, where "enough" is a matter of personal preference.

          Sounds like safe-mode, which precisely defines what "enough" is.
          We consider the name-node alive, when it leaves safe mode.
          Some links can be found in
          http://wiki.apache.org/hadoop/FAQ#12

          > How do we represent these states to the caller?
          > One option: fail the ping() with explicit exceptions

          This is what I do not understand. Why do you need ping() to fail if you can instead call getServiceState() and get more
          detailed information on the node state rather than just boolean failed / not failed provided by ping()?

          > A Datanode is only live when it is connected to a name node.

          It would be good to have precise definitions like that for the states of different servers (in JavaDoc).
          Sometimes it is not clear what is the difference between INITIALIZED, STARTED, and LIVE.
          Or why does it matter whether the server was TERMINATED or FAILED.

          Show
          Konstantin Shvachko added a comment - Steve, thanks for the slides. Lots of interesting stuff. > HDFS cluster is "live" if enough datanodes are connected to the namenode, where "enough" is a matter of personal preference. Sounds like safe-mode, which precisely defines what "enough" is. We consider the name-node alive, when it leaves safe mode. Some links can be found in http://wiki.apache.org/hadoop/FAQ#12 > How do we represent these states to the caller? > One option: fail the ping() with explicit exceptions This is what I do not understand. Why do you need ping() to fail if you can instead call getServiceState() and get more detailed information on the node state rather than just boolean failed / not failed provided by ping()? > A Datanode is only live when it is connected to a name node. It would be good to have precise definitions like that for the states of different servers (in JavaDoc). Sometimes it is not clear what is the difference between INITIALIZED, STARTED, and LIVE. Or why does it matter whether the server was TERMINATED or FAILED.
          Hide
          steve_l added a comment -

          Konstantin -glad you like the slides, we could do a phone conf and give them to you, ideally with a demo of what I have working

          ping() is interesting because its a way of checking even remotely that a system is healthy. A good check here verifies that all the composite parts of the system are happy, and if so, its happy too. Which means that being able to aggregate the health of other things is handy. But the ping() can do more than just report the current health/state, it can do some extra work that checks state better. For example, if a health requirement is that a specific directory must be writeable and its clock must be close to that of the host CPU, a good test is: create a file of a few bytes and check its timestamp. But at the same time, its good to have a fast test without major side effects. If the ping() created a 20MB file then it would be slow and other things could suffer. Fast tests are good.

          Some goals of the design are
          -easy to subclass. I'm experimenting with an extended name node that fails a ping() when a min# of workers are attached.
          -easy to aggregate. A datanode is live if the filesystem.ping() is happy; a cluster could be defined as live if 3 out of 5 datanodes were up at any point in time.
          -let us extend with more than just the main services. I have things that I ping to hit web pages and expect a response in a specific range, or check for files existing, without needing to create new threads for everything that I check.

          The current split between getServiceState() and ping() has the getServiceState() call reporting the state at the last time it was changed, while ping() can actually do some health checking. And it doesnt need another thread if you don't want to do work in the background.

          >Sometimes it is not clear what is the difference between INITIALIZED, STARTED, and LIVE.
          >Or why does it matter whether the server was TERMINATED or FAILED.

          I will try and document the design. It's based on some of the stuff I've done in smartfrog and some of the grid standards bodies, where I learned to avoid any complex DMTF compatible state model as it got way too complex fast.
          INITIALIZED: you've read your stuff in, not started any threads or anything. This is done in init(), and is a good place for subclasses to do tricks before or after super.innerInit() is called.
          STARTED: you've been told to start, and once you are ready, you will go live
          LIVE: ready to do useful work. Something may still go wrong on a request, but, well, that's the only way to be 100% sure that anything is healthy.
          FAILED: Something went wrong, you have no intention of ever working again, but are still instantiated.
          TERMINATED: at this point you are about to be deleted; all references to you removed. Its the end point for the state graph.

          Show
          steve_l added a comment - Konstantin -glad you like the slides, we could do a phone conf and give them to you, ideally with a demo of what I have working ping() is interesting because its a way of checking even remotely that a system is healthy. A good check here verifies that all the composite parts of the system are happy, and if so, its happy too. Which means that being able to aggregate the health of other things is handy. But the ping() can do more than just report the current health/state, it can do some extra work that checks state better. For example, if a health requirement is that a specific directory must be writeable and its clock must be close to that of the host CPU, a good test is: create a file of a few bytes and check its timestamp. But at the same time, its good to have a fast test without major side effects. If the ping() created a 20MB file then it would be slow and other things could suffer. Fast tests are good. Some goals of the design are -easy to subclass. I'm experimenting with an extended name node that fails a ping() when a min# of workers are attached. -easy to aggregate. A datanode is live if the filesystem.ping() is happy; a cluster could be defined as live if 3 out of 5 datanodes were up at any point in time. -let us extend with more than just the main services. I have things that I ping to hit web pages and expect a response in a specific range, or check for files existing, without needing to create new threads for everything that I check. The current split between getServiceState() and ping() has the getServiceState() call reporting the state at the last time it was changed, while ping() can actually do some health checking. And it doesnt need another thread if you don't want to do work in the background. >Sometimes it is not clear what is the difference between INITIALIZED, STARTED, and LIVE. >Or why does it matter whether the server was TERMINATED or FAILED. I will try and document the design. It's based on some of the stuff I've done in smartfrog and some of the grid standards bodies, where I learned to avoid any complex DMTF compatible state model as it got way too complex fast. INITIALIZED: you've read your stuff in, not started any threads or anything. This is done in init(), and is a good place for subclasses to do tricks before or after super.innerInit() is called. STARTED: you've been told to start, and once you are ready, you will go live LIVE: ready to do useful work. Something may still go wrong on a request, but, well, that's the only way to be 100% sure that anything is healthy. FAILED: Something went wrong, you have no intention of ever working again, but are still instantiated. TERMINATED: at this point you are about to be deleted; all references to you removed. Its the end point for the state graph.
          Hide
          steve_l added a comment -

          This patch is in sync with SVN HEAD and works around consequences of HADOOP-3780 .

          Let's see what Hudson thinks of this. I'm about to go offline until thursday, so don't expect any follow up comments until then.

          While running tests locally TestFileAppend2 failed once; it was calling MiniDFSCluster.waitActive(), but the cluster was still in safe mode. I'm not sure if these changes triggered that event, or if it is a race condition that has finally surfaced. Let's see what hudson says; its easy to add a MiniDFSCluster.waitClusterAlive() to block for the cluster actually being live.

          Show
          steve_l added a comment - This patch is in sync with SVN HEAD and works around consequences of HADOOP-3780 . Let's see what Hudson thinks of this. I'm about to go offline until thursday, so don't expect any follow up comments until then. While running tests locally TestFileAppend2 failed once; it was calling MiniDFSCluster.waitActive(), but the cluster was still in safe mode. I'm not sure if these changes triggered that event, or if it is a race condition that has finally surfaced. Let's see what hudson says; its easy to add a MiniDFSCluster.waitClusterAlive() to block for the cluster actually being live.
          Hide
          Tom White added a comment -

          Steve, It would be nice to put the lifecycle state diagram from your slides into the Javadoc for Service, if possible.

          Show
          Tom White added a comment - Steve, It would be nice to put the lifecycle state diagram from your slides into the Javadoc for Service, if possible.
          Hide
          Jim Kellerman added a comment -

          HBase uses the following to determine when Hadoop exits safe mode:

              if (this.fs instanceof DistributedFileSystem) {
                // Make sure dfs is not in safe mode
                String message = "Waiting for dfs to exit safe mode...";
                while (((DistributedFileSystem) fs).setSafeMode(
                    FSConstants.SafeModeAction.SAFEMODE_GET)) {
                  LOG.info(message);
                  try {
                    Thread.sleep(this.threadWakeFrequency);
                  } catch (InterruptedException e) {
                    //continue
                  }
                }
              }
          
          Show
          Jim Kellerman added a comment - HBase uses the following to determine when Hadoop exits safe mode: if ( this .fs instanceof DistributedFileSystem) { // Make sure dfs is not in safe mode String message = "Waiting for dfs to exit safe mode..." ; while (((DistributedFileSystem) fs).setSafeMode( FSConstants.SafeModeAction.SAFEMODE_GET)) { LOG.info(message); try { Thread .sleep( this .threadWakeFrequency); } catch (InterruptedException e) { // continue } } }
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12388752/hadoop-3628.patch
          against trunk revision 688101.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 50 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 3 new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3087/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3087/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3087/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3087/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12388752/hadoop-3628.patch against trunk revision 688101. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 50 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 3 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3087/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3087/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3087/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3087/console This message is automatically generated.
          Hide
          steve_l added a comment -

          Test org.apache.hadoop.hdfs.TestDatanodeDeath.testDatanodeDeath failed.

          java.io.IOException: Error Recovery for block blk_1901352949957000872_1001 failed because recovery from primary datanode 127.0.0.1:44320 failed 6 times. Aborting...
          at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2287)
          at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1822)
          at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1979)

          Show
          steve_l added a comment - Test org.apache.hadoop.hdfs.TestDatanodeDeath.testDatanodeDeath failed. java.io.IOException: Error Recovery for block blk_1901352949957000872_1001 failed because recovery from primary datanode 127.0.0.1:44320 failed 6 times. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2287) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1822) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1979)
          Hide
          Konstantin Shvachko added a comment -

          Steve,

          1. In the patch you consider name-node LIVE when it finishes initialization.
            At this point the name-node starts serving data-nodes for registrations and block reports and clients for read-only requests.
            Namespace modification and replication will be prohibited until the name-node is out of safe mode. This particularly means that jobs will not be able to run because they will not be able to create input files or replicate configuration files.
            So either we should say that name-node is LIVE when it is out of safe mode or we should introduce some other state that reflects name-node's readiness to perform complete set of services.
            I think we should consider name-node LIVE when it comes out of safe mode.
          2. JavaDocs should explicitly define each ServiceState for each server. E.g.
            /** DataNode is LIVE if it successfully completed registration with the name-node */
            /** NameNode is LIVE if it is out of safe mode */
            

            etc. I mean saying "the service is now live and available for external use" is right, but too general, it can be clarified in "native" terms for each server.

          3. Looking at your ping() implementation it seems that the purpose of this method is to report up a string describing what exactly is wrong with the server at the moment. So why do we not just return the string instead of throwing an exception the only purpose of which is to wrap that string. Imagine how one would use this: catch exception, getMessage(), and then print or parse the message.
            As Doug advocated many times: "Exceptions should not be used for normal control flow." I think for ping() this is normal control flow.
          Show
          Konstantin Shvachko added a comment - Steve, In the patch you consider name-node LIVE when it finishes initialization. At this point the name-node starts serving data-nodes for registrations and block reports and clients for read-only requests. Namespace modification and replication will be prohibited until the name-node is out of safe mode. This particularly means that jobs will not be able to run because they will not be able to create input files or replicate configuration files. So either we should say that name-node is LIVE when it is out of safe mode or we should introduce some other state that reflects name-node's readiness to perform complete set of services. I think we should consider name-node LIVE when it comes out of safe mode. JavaDocs should explicitly define each ServiceState for each server. E.g. /** DataNode is LIVE if it successfully completed registration with the name-node */ /** NameNode is LIVE if it is out of safe mode */ etc. I mean saying "the service is now live and available for external use" is right, but too general, it can be clarified in "native" terms for each server. Looking at your ping() implementation it seems that the purpose of this method is to report up a string describing what exactly is wrong with the server at the moment. So why do we not just return the string instead of throwing an exception the only purpose of which is to wrap that string. Imagine how one would use this: catch exception, getMessage(), and then print or parse the message. As Doug advocated many times: "Exceptions should not be used for normal control flow." I think for ping() this is normal control flow.
          Hide
          steve_l added a comment -

          This is the patch in sync with SVN_HEAD; let's see what happens on hudson.

          Show
          steve_l added a comment - This is the patch in sync with SVN_HEAD; let's see what happens on hudson.
          Hide
          steve_l added a comment -

          checking to see how hudson deals with this; all local tests are passing but the ulimited streaming test which is currently incompatible with JRockit 64 bit.

          Show
          steve_l added a comment - checking to see how hudson deals with this; all local tests are passing but the ulimited streaming test which is currently incompatible with JRockit 64 bit.
          Hide
          steve_l added a comment -

          Konstantin Shvachko has proposed that instead of throwing an exception on a health failure, different error text should be returned. His concern is that the cost of constructing and marshalling an exception can be high, and should not be used if the exception is likely to be returned regularly.

          This is an interesting thought. The arguments in favour of an exception are

          1.For individual nodes, failure is the unexpected state. Normally all should be well.
          2.Much of the cost of creating an exception is the cost of creating a stack trace; this can be useful for later diagnostics.
          3.By returning different exceptions for different problems, callers can diagnose and act on different failures. It is much harder for programs to act
          on simple strings.
          4.Exceptions are going to be raised if the far end is unreachable, so the caller needs to be prepared for those exceptions, and know that any exception raised on a call is a sign of a failure.

          However, there are some good arguments in favour of returning a structure response instead.

          1.Stack traces are less useful when they are just code inside the health check.
          2.Unmarshalling exceptions reliably requires the caller to have a set of exception classes and versions in their JVM. With nested exceptions, that implies the entire exception list needs to be unmarshallable.
          3.The aggregate health checks of the clusters themselves will inevitably include failed nodes. Should an aggregate health check include those node failures in a report that says "overall, we are healthy, here are the nodes that are not"

          The alternative to sending exceptions back on a ping() would be to return a NodeHealth structure that included node name, IPAddress, service type and a list of what was wrong with the node, as well as an aggregate "live/not live" response. The list of what was wrong could include standard constant values for machine interpretation, as well as human-readable messages.

          What do others think?

          Show
          steve_l added a comment - Konstantin Shvachko has proposed that instead of throwing an exception on a health failure, different error text should be returned. His concern is that the cost of constructing and marshalling an exception can be high, and should not be used if the exception is likely to be returned regularly. This is an interesting thought. The arguments in favour of an exception are 1.For individual nodes, failure is the unexpected state. Normally all should be well. 2.Much of the cost of creating an exception is the cost of creating a stack trace; this can be useful for later diagnostics. 3.By returning different exceptions for different problems, callers can diagnose and act on different failures. It is much harder for programs to act on simple strings. 4.Exceptions are going to be raised if the far end is unreachable, so the caller needs to be prepared for those exceptions, and know that any exception raised on a call is a sign of a failure. However, there are some good arguments in favour of returning a structure response instead. 1.Stack traces are less useful when they are just code inside the health check. 2.Unmarshalling exceptions reliably requires the caller to have a set of exception classes and versions in their JVM. With nested exceptions, that implies the entire exception list needs to be unmarshallable. 3.The aggregate health checks of the clusters themselves will inevitably include failed nodes. Should an aggregate health check include those node failures in a report that says "overall, we are healthy, here are the nodes that are not" The alternative to sending exceptions back on a ping() would be to return a NodeHealth structure that included node name, IPAddress, service type and a list of what was wrong with the node, as well as an aggregate "live/not live" response. The list of what was wrong could include standard constant values for machine interpretation, as well as human-readable messages. What do others think?
          Hide
          Pete Wyckoff added a comment -

          from the standpoint of external APIs, error codes would seem to be much better suited.

          Show
          Pete Wyckoff added a comment - from the standpoint of external APIs, error codes would seem to be much better suited.
          Hide
          steve_l added a comment -

          Here is my write up of the lifecycle, experience implementing it, testing it and outstanding issues. I will submit the PDF version next.

          Show
          steve_l added a comment - Here is my write up of the lifecycle, experience implementing it, testing it and outstanding issues. I will submit the PDF version next.
          Hide
          steve_l added a comment -

          Here is the PDF version of the document.

          Show
          steve_l added a comment - Here is the PDF version of the document.
          Hide
          Owen O'Malley added a comment -

          I think that strings are a really bad plan. The reasonable choices are enums and exceptions and the exceptions provide a lot more opportunities for explaining what went wrong.

          Show
          Owen O'Malley added a comment - I think that strings are a really bad plan. The reasonable choices are enums and exceptions and the exceptions provide a lot more opportunities for explaining what went wrong.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12389295/hadoop-3628.patch
          against trunk revision 691306.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 53 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 3 new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3161/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3161/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3161/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3161/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12389295/hadoop-3628.patch against trunk revision 691306. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 53 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 3 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3161/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3161/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3161/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3161/console This message is automatically generated.
          Hide
          steve_l added a comment -

          one possibility is that you can ping() a node or a cluster to get state; and that the response is a list of one or more nodes and their health, health which can include embedded exceptions. so a single node health check would return the machine readeable exception for that node; ping an aggregate cluster and you get the list of all of them.

          assuming that there are are some standard IOExceptions for hadoop-specific states (not enough disk space, bad versions, etc), and that everyone has synchronized class versions then this would work. The HTML page front end would take this and generate something for people; the raw operation would be something for machines to handle.

          Show
          steve_l added a comment - one possibility is that you can ping() a node or a cluster to get state; and that the response is a list of one or more nodes and their health, health which can include embedded exceptions. so a single node health check would return the machine readeable exception for that node; ping an aggregate cluster and you get the list of all of them. assuming that there are are some standard IOExceptions for hadoop-specific states (not enough disk space, bad versions, etc), and that everyone has synchronized class versions then this would work. The HTML page front end would take this and generate something for people; the raw operation would be something for machines to handle.
          Hide
          steve_l added a comment -

          3 javadoc warnings
          [exec] [javadoc] Building tree for all the packages and classes...
          [exec] [javadoc] /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/core/org/apache/hadoop/util/Service.java:104: warning - Tag @link:illegal character: "42" in "* ServiceState#TERMINATED"
          [exec] [javadoc] /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/core/org/apache/hadoop/util/Service.java:104: warning - Tag @link: reference not found: * ServiceState#TERMINATED
          [exec] [javadoc] /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/mapred/org/apache/hadoop/mapred/JobTracker.java:564: warning - Tag @link: reference not found: Service.#start()

          Show
          steve_l added a comment - 3 javadoc warnings [exec] [javadoc] Building tree for all the packages and classes... [exec] [javadoc] /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/core/org/apache/hadoop/util/Service.java:104: warning - Tag @link:illegal character: "42" in "* ServiceState#TERMINATED" [exec] [javadoc] /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/core/org/apache/hadoop/util/Service.java:104: warning - Tag @link: reference not found: * ServiceState#TERMINATED [exec] [javadoc] /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/mapred/org/apache/hadoop/mapred/JobTracker.java:564: warning - Tag @link: reference not found: Service.#start()
          Hide
          Konstantin Shvachko added a comment -

          Steve,

          • I totally support your idea of returning a structured response. NodeHealth structure may also include service state, a message and an error code (as Pete proposes) in the form of an enum (as Owen asks).
          • But I don't think it should be thrown as an exception, because you can only pack a text within an exception.
          • I also think that ping can be merged with the getServiceState() after that. Because the getServiceState() result is a subset of what ping() returns.
          • BTW, I was not proposing to return a string and my main concern is not the cost of marshaling.
          • I was advocating not to throw exceptions as a normal result of an operation. Because exceptions are for error handling.

          The life cycle api is a very interesting and useful abstraction.
          And that is why I think it is so important to refine it.
          It is a public api, so once committed it will be hard to change it.

          The attached pdf helps a lot. Is it going to be a part of hadoop documentation?
          A minor correction. Safe mode is not referred to data-nodes, only the name-node.
          I think your previous variant of defining a data-node live after it registered with the name-node was correct.

          Show
          Konstantin Shvachko added a comment - Steve, I totally support your idea of returning a structured response. NodeHealth structure may also include service state, a message and an error code (as Pete proposes) in the form of an enum (as Owen asks). But I don't think it should be thrown as an exception, because you can only pack a text within an exception. I also think that ping can be merged with the getServiceState() after that. Because the getServiceState() result is a subset of what ping() returns. BTW, I was not proposing to return a string and my main concern is not the cost of marshaling. I was advocating not to throw exceptions as a normal result of an operation. Because exceptions are for error handling. The life cycle api is a very interesting and useful abstraction. And that is why I think it is so important to refine it. It is a public api, so once committed it will be hard to change it. The attached pdf helps a lot. Is it going to be a part of hadoop documentation? A minor correction. Safe mode is not referred to data-nodes, only the name-node. I think your previous variant of defining a data-node live after it registered with the name-node was correct.
          Hide
          steve_l added a comment -

          need more test coverage

          Show
          steve_l added a comment - need more test coverage
          Hide
          steve_l added a comment -

          Konstantin,
          -of course the docs can be part of the hadoop documentation; they've got apache licenses on them for that reason.

          -The nice thing about getServiceState() is it side effect free. Once we add in more advance health checks into things, the cost of a ping() will increase. So yes, the same info will be returned, but a ping() does more work. If you take that away then every service would have to create another thread to do its own health checks; not that expensive for big services, but high for small things deployed many-to-a-JVM.

          -if we do away with the separate operation, then HTML pages for each service could still get the state and return something other than 200 if the service was not in one of the desired states. Load balancers and other tools could use this with a GET of /service/state?state=LIVE to demand a live service.

          -I'll try and prototype a response that includes state+other info, though I'm pretty busy with other things for the next week. I will propose some ideas first.

          Some of the WS-* management APIS have a state model that can include lots of inner information
          http://docs.oasis-open.org/wsdm/wsdm-muws2-1.1-spec-os-01.htm#_Toc129683823. I'm not sure how far down that path we should go. Maybe people that use the various management tools to monitor their cluster will have opinions here?

          Show
          steve_l added a comment - Konstantin, -of course the docs can be part of the hadoop documentation; they've got apache licenses on them for that reason. -The nice thing about getServiceState() is it side effect free. Once we add in more advance health checks into things, the cost of a ping() will increase. So yes, the same info will be returned, but a ping() does more work. If you take that away then every service would have to create another thread to do its own health checks; not that expensive for big services, but high for small things deployed many-to-a-JVM. -if we do away with the separate operation, then HTML pages for each service could still get the state and return something other than 200 if the service was not in one of the desired states. Load balancers and other tools could use this with a GET of /service/state?state=LIVE to demand a live service. -I'll try and prototype a response that includes state+other info, though I'm pretty busy with other things for the next week. I will propose some ideas first. Some of the WS-* management APIS have a state model that can include lots of inner information http://docs.oasis-open.org/wsdm/wsdm-muws2-1.1-spec-os-01.htm#_Toc129683823 . I'm not sure how far down that path we should go. Maybe people that use the various management tools to monitor their cluster will have opinions here?
          Hide
          Pete Wyckoff added a comment -

          I opened: https://issues.apache.org/jira/browse/HADOOP-3969, to implement a mechanism for instantiating a class that exposes the public service apis to external tools.

          package org.apache.hadoop.util;
          
          public interface ExposeServiceAPIs {
          
             /**
             * @param service - the service whose APIs are to be exposed
            * @param  serviceName - a symbolic name for the service
            * @param configuration - the hadoop configuration object
           **/
            public void initialize(Service service, String serviceName, Configuration conf) throws IOException,
            public boolean start();
            public boolean stop();
          } ;
          
          

          The reference implementation will be REST which I'm working on now, but will obviously need some hooks in the service class itself.

          Show
          Pete Wyckoff added a comment - I opened: https://issues.apache.org/jira/browse/HADOOP-3969 , to implement a mechanism for instantiating a class that exposes the public service apis to external tools. package org.apache.hadoop.util; public interface ExposeServiceAPIs { /** * @param service - the service whose APIs are to be exposed * @param serviceName - a symbolic name for the service * @param configuration - the hadoop configuration object **/ public void initialize(Service service, String serviceName, Configuration conf) throws IOException, public boolean start(); public boolean stop(); } ; The reference implementation will be REST which I'm working on now, but will obviously need some hooks in the service class itself.
          Hide
          steve_l added a comment -

          This patch isnt ready to go into the code; I'm putting it up to show progress.

          -ping() now builds up a status that can include a list of exceptions, so a ping failure can list everything that is wrong. One thing that is unclear is should a ping() failure trigger a transition into the FAILED state, or should the code hope that the system can recover?

          -the Service class implements Closeable, and the void close() throws IOException; method of that interface -it means close me and you can throw an exception. terminate() relays to close() and logs the failure. If you want to close quietly, use terminate(); if you are OK with an exception call close(). Anything aggregating other services has to call terminate() or othewise discard the exceptions, so it does a best case shut down.

          -I've moved MockService into the core, though tools is probably better. Why out of test? Because it turns out to be invaluable for hadoop management tools to test their ability to handle failures in the service lifecycle.

          I dont think ping() and the termination logic are stable yet; they seem over complex. I'd rather Closeable() and close() everywhere, including other things that are aggregate services (MiniDFSCluster, MiniMRCluster).

          Show
          steve_l added a comment - This patch isnt ready to go into the code; I'm putting it up to show progress. -ping() now builds up a status that can include a list of exceptions, so a ping failure can list everything that is wrong. One thing that is unclear is should a ping() failure trigger a transition into the FAILED state, or should the code hope that the system can recover? -the Service class implements Closeable, and the void close() throws IOException; method of that interface -it means close me and you can throw an exception. terminate() relays to close() and logs the failure. If you want to close quietly, use terminate(); if you are OK with an exception call close(). Anything aggregating other services has to call terminate() or othewise discard the exceptions, so it does a best case shut down. -I've moved MockService into the core, though tools is probably better. Why out of test? Because it turns out to be invaluable for hadoop management tools to test their ability to handle failures in the service lifecycle. I dont think ping() and the termination logic are stable yet; they seem over complex. I'd rather Closeable() and close() everywhere, including other things that are aggregate services (MiniDFSCluster, MiniMRCluster).
          Hide
          steve_l added a comment -

          This is the lifecycle patch in sync with SVN head. On my system it appeared to pass all tests, and it passes our own set, including ones that check that orphan tasktrackers and namenodes can be shut down. There the answer is "yes, but it can take up to a minute"

          Although I'm marking as "grant license" so that hudson will look at it, I am currently going through the approval process for HP to say "we release this code to apache". People can d/l and play with it, but no committing, please.

          Show
          steve_l added a comment - This is the lifecycle patch in sync with SVN head. On my system it appeared to pass all tests, and it passes our own set, including ones that check that orphan tasktrackers and namenodes can be shut down. There the answer is "yes, but it can take up to a minute" Although I'm marking as "grant license" so that hudson will look at it, I am currently going through the approval process for HP to say "we release this code to apache". People can d/l and play with it, but no committing, please.
          Hide
          steve_l added a comment -

          submitting the patch. Incidentally, there's 0.20.1 or 0.21 on the jira version list; time to add them?

          Show
          steve_l added a comment - submitting the patch. Incidentally, there's 0.20.1 or 0.21 on the jira version list; time to add them?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12395934/hadoop-3628.patch
          against trunk revision 726129.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 56 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3733/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12395934/hadoop-3628.patch against trunk revision 726129. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 56 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3733/console This message is automatically generated.
          Hide
          Suresh Srinivas added a comment -

          Some comments on the document:
          -Section 1.1 - Secondary NameNode is listed as "failover for the NameNode". It is a check pointer node currently.
          -Section 1.1 - DataNode does not have notion of safemode. Same applies to section 2.1 Liveness discussion.
          -Section 3.1
          -Should we have a state that captures out of service for maintenance?
          -Section 3.1.1 -I am not sure why initialized state is required. Initialization could be part of starting up? I prefer merging states INITIALIZED and STARTED and calling it as STARTING.
          -I am not clear on how Failed state transitions to Terminated. If failed state transitions to terminated, the fact that the service failed will no longer available?

          Comments on the code:
          -Statemachine pattern could be used as follows using service as the context. The base ServiceState could have default implementation or force the concrete states to implement the messages start, stop, terminate and ping. Each state can have enter and exit to implement necessary functionality when entering and exiting a state.

             private abstract class ServiceState {
          
              abstract void enter(Service service, ServiceState newState);
          
              abstract void exit(Service service);
          
              void start(Service service) {
                // defalt implementation to either ignore or throw exception
                // I prefer not ignoring and throwing exception if the service is
                // already started
              } 
          
              // Default implementation
              void stop(Service service) {
                service.enterState(Service.terminatedState);
              }
          
              ServiceStatus ping(); {
              }
          
              void terminate(Service service);
          
              abstract void getName();
            }
          }
          

          The concrete states override only handling the messages relevant to them. For example, method start is overridden only by the CreatedState.

            class CreatedState extends ServiceState {
              void enter(Service service, ServiceState newState);
              void exit(Service service);
              void start();
              void terminate();
            }
          
            class LiveState(Service service) {
            }
          
          

          -With the above implementation, setServiceState will change to:

            protected final void setServiceState(ServiceState serviceState) {
            {
              this.serviceState.exit();
              this.serviceState = serviceState;
              this.serviceState.enter();
            }
          

          protected final void setServiceState(ServiceState serviceState) {

          This gives each state a chance to do things needed during entry and cleanup during exit.

          -A singleton instance of each state is used.

          The advantages of doing the above is:
          1. I am currently looking at introducing HA to NameNode and JobTrackers. When doing this, I can easily introduce new states such as Active and Standby, by introducing say, HAService deriving from Service and reusing a lot of the code and the base ServiceState
          2. Checks done in methods such as start, ping for appropriate state can be avoided, since only appropriate states handle the methods and the rest get default implementation from the super class ServiceState.
          3. isValidStateTransition checks can also be moved to enter method of each state.

          -In the comment for the class Service, if I understand right, it looks like innerStart is not synchronous. That is, when it returns the service may not have started completely. I prefer it to be synchronous, where the method blocks for the service to start before returning (if the start is done in other threads). Otherwise, since exception cannot be thrown, in case start failures, enterFailedState needs to be called by the Service subclass. The comments need to indicate this.

          -start method is honored even if the current state is TERMINATED. Is this intended?
          -Why are we using IOException to indicate exceptional conditions from methods such as innerStart etc.?

          -Since ping method is called from an external entity to check the service state, depending on it to enter FailedState does not seem right.

          -innerStart, innerPing, innerClose, getServiceName etc need to be abstract methods to force the subclasses to provide concrete implementations.

          -I am not sure why two flovors of enterState method is needed. The second one with entryState parameter seems unnecessary. Instead should we allow transition from LIVE state to STARTED, as that is the only case where this is used.

          -verifyState with single expected state seems better. vertifyState with expected1, expected2 can be avoided by calling verifyState twice with two expected state. That seems simpler.

          -I prefer not supporting Datanode.shutdown method. Instead use the Service method Datanode.terminate to avoid different public methods doing the same thing. Same applies to other classes inheriting from Service. This could be another Jira.

          -Should innerPing throw LivenessException?

          -In ping method, the ServiceStatus returned, has throwable in states other than LIVE. The comment seems to indicate it is being done only for FAILED and TERMINATED state. It will happen for CREATED, STARTED states as well.

          -It is good not to pass the ServiceState to onInnerPingFailure just to get in the passed ServiceState , the FAILED state along with the exception. It could be done in ping method. This helps in doing one less thing in onInnerPingFailure when overriding it.

          -Since the change is quite big, I will take another stab at the review, if you submit an updated patch.

          Show
          Suresh Srinivas added a comment - Some comments on the document: -Section 1.1 - Secondary NameNode is listed as "failover for the NameNode". It is a check pointer node currently. -Section 1.1 - DataNode does not have notion of safemode. Same applies to section 2.1 Liveness discussion. -Section 3.1 -Should we have a state that captures out of service for maintenance? -Section 3.1.1 -I am not sure why initialized state is required. Initialization could be part of starting up? I prefer merging states INITIALIZED and STARTED and calling it as STARTING. -I am not clear on how Failed state transitions to Terminated. If failed state transitions to terminated, the fact that the service failed will no longer available? Comments on the code: -Statemachine pattern could be used as follows using service as the context. The base ServiceState could have default implementation or force the concrete states to implement the messages start, stop, terminate and ping . Each state can have enter and exit to implement necessary functionality when entering and exiting a state. private abstract class ServiceState { abstract void enter(Service service, ServiceState newState); abstract void exit(Service service); void start(Service service) { // defalt implementation to either ignore or throw exception // I prefer not ignoring and throwing exception if the service is // already started } // Default implementation void stop(Service service) { service.enterState(Service.terminatedState); } ServiceStatus ping(); { } void terminate(Service service); abstract void getName(); } } The concrete states override only handling the messages relevant to them. For example, method start is overridden only by the CreatedState. class CreatedState extends ServiceState { void enter(Service service, ServiceState newState); void exit(Service service); void start(); void terminate(); } class LiveState(Service service) { } -With the above implementation, setServiceState will change to: protected final void setServiceState(ServiceState serviceState) { { this.serviceState.exit(); this.serviceState = serviceState; this.serviceState.enter(); } protected final void setServiceState(ServiceState serviceState) { This gives each state a chance to do things needed during entry and cleanup during exit. -A singleton instance of each state is used. The advantages of doing the above is: 1. I am currently looking at introducing HA to NameNode and JobTrackers. When doing this, I can easily introduce new states such as Active and Standby, by introducing say, HAService deriving from Service and reusing a lot of the code and the base ServiceState 2. Checks done in methods such as start, ping for appropriate state can be avoided, since only appropriate states handle the methods and the rest get default implementation from the super class ServiceState. 3. isValidStateTransition checks can also be moved to enter method of each state. -In the comment for the class Service , if I understand right, it looks like innerStart is not synchronous. That is, when it returns the service may not have started completely. I prefer it to be synchronous, where the method blocks for the service to start before returning (if the start is done in other threads). Otherwise, since exception cannot be thrown, in case start failures, enterFailedState needs to be called by the Service subclass. The comments need to indicate this. - start method is honored even if the current state is TERMINATED . Is this intended? -Why are we using IOException to indicate exceptional conditions from methods such as innerStart etc.? -Since ping method is called from an external entity to check the service state, depending on it to enter FailedState does not seem right. - innerStart , innerPing , innerClose , getServiceName etc need to be abstract methods to force the subclasses to provide concrete implementations. -I am not sure why two flovors of enterState method is needed. The second one with entryState parameter seems unnecessary. Instead should we allow transition from LIVE state to STARTED , as that is the only case where this is used. -verifyState with single expected state seems better. vertifyState with expected1, expected2 can be avoided by calling verifyState twice with two expected state. That seems simpler. -I prefer not supporting Datanode.shutdown method. Instead use the Service method Datanode.terminate to avoid different public methods doing the same thing. Same applies to other classes inheriting from Service . This could be another Jira. -Should innerPing throw LivenessException? -In ping method, the ServiceStatus returned, has throwable in states other than LIVE. The comment seems to indicate it is being done only for FAILED and TERMINATED state. It will happen for CREATED, STARTED states as well. -It is good not to pass the ServiceState to onInnerPingFailure just to get in the passed ServiceState , the FAILED state along with the exception. It could be done in ping method. This helps in doing one less thing in onInnerPingFailure when overriding it. -Since the change is quite big, I will take another stab at the review, if you submit an updated patch.
          Hide
          steve_l added a comment -

          This is the patch in sync with SVN_HEAD; still going through the signoff paperwork for apache rights, so it's here as 'for interested' rather than for considering commiting.

          Show
          steve_l added a comment - This is the patch in sync with SVN_HEAD; still going through the signoff paperwork for apache rights, so it's here as 'for interested' rather than for considering commiting.
          Hide
          steve_l added a comment -

          Suresh, thanks for the comments; I've got an annotated copy of the doc from Tom White to run through now too. Now that I'm in sync with SVN_HEAD and all appears well, I can sit down and look at the comments -I will have an update out by the end of the week.

          Show
          steve_l added a comment - Suresh, thanks for the comments; I've got an annotated copy of the doc from Tom White to run through now too. Now that I'm in sync with SVN_HEAD and all appears well, I can sit down and look at the comments -I will have an update out by the end of the week.
          Hide
          steve_l added a comment -

          Suresh, I'm finally sitting down to look at your comments; its going to take me a while to work through them. Thank you (and tom!) for their comments.

          Some quick answers.

          "Should we have a state that captures out of service for maintenance?"

          It's kind of tricky to have that on something that is network visible, since a lot of maintenance makes the program unreachable. At the same time, I could imagine a cluster being offline to new
          job submissions, or a filesystem in read-only mode.

          On a related note, I've long wondered if we should have an XHML format for web sites to say that they are offline for maintenance in some way that was machine readable; something you could aggregate and which would include forward maintenance notices. Something would hit the status pages of the various machines in your infrastructure and build up a calendar of planned outages, work out which were the SPOF and aggregate them differently from the redundant bits. I dont think this is the right time for this, but it's still an idea I'm fond of.

          "-I am not clear on how Failed state transitions to Terminated. If failed state transitions to terminated, the fact that the service failed will no longer available?"

          Good question. When terminated, a service should shut down its thread, do any cleanup. But
          any underlying exception that triggered a failure should still be in the failureCause field. So provided that a throwable was passed in to enterFailedState(), the service remembers what happened.
          What I'm not doing (currently) is retaining the entire history. We could do that if you felt it was useful; build up a list of state transitions and timestamps, the history.

          I'm going to look more at your statemachine proposal. One thing that worries me about being able to add new states is how well do they aggregate?

          Show
          steve_l added a comment - Suresh, I'm finally sitting down to look at your comments; its going to take me a while to work through them. Thank you (and tom!) for their comments. Some quick answers. "Should we have a state that captures out of service for maintenance?" It's kind of tricky to have that on something that is network visible, since a lot of maintenance makes the program unreachable. At the same time, I could imagine a cluster being offline to new job submissions, or a filesystem in read-only mode. On a related note, I've long wondered if we should have an XHML format for web sites to say that they are offline for maintenance in some way that was machine readable; something you could aggregate and which would include forward maintenance notices. Something would hit the status pages of the various machines in your infrastructure and build up a calendar of planned outages, work out which were the SPOF and aggregate them differently from the redundant bits. I dont think this is the right time for this, but it's still an idea I'm fond of. "-I am not clear on how Failed state transitions to Terminated. If failed state transitions to terminated, the fact that the service failed will no longer available?" Good question. When terminated, a service should shut down its thread, do any cleanup. But any underlying exception that triggered a failure should still be in the failureCause field. So provided that a throwable was passed in to enterFailedState(), the service remembers what happened. What I'm not doing (currently) is retaining the entire history. We could do that if you felt it was useful; build up a list of state transitions and timestamps, the history. I'm going to look more at your statemachine proposal. One thing that worries me about being able to add new states is how well do they aggregate?
          Hide
          steve_l added a comment -

          This is an annotated copy of the document with comments by tom white; here for completeness and for steve to merge in.

          Show
          steve_l added a comment - This is an annotated copy of the document with comments by tom white; here for completeness and for steve to merge in.
          Hide
          steve_l added a comment -

          Here is the latest patch in sync with svn head; still chasing sign off by various open source committees.

          Show
          steve_l added a comment - Here is the latest patch in sync with svn head; still chasing sign off by various open source committees.
          Hide
          steve_l added a comment -

          This is an updated patch. Some race conditions at shutdown while jobs are in the queue are being dealt with

          Show
          steve_l added a comment - This is an updated patch. Some race conditions at shutdown while jobs are in the queue are being dealt with
          Hide
          steve_l added a comment -

          noting that a stable wire format for exceptions could be used in the ping() operation; the ServiceStatus would contain a list of events (text or exceptions), with or without stack traces. This would reduce the cost of returning a long list of problems.

          Show
          steve_l added a comment - noting that a stable wire format for exceptions could be used in the ping() operation; the ServiceStatus would contain a list of events (text or exceptions), with or without stack traces. This would reduce the cost of returning a long list of problems.
          Hide
          steve_l added a comment -

          -The ASF License is granted here because management have given me a green light for this
          -this is against SVN_HEAD, with changes to handle the shared cleanup queue (we no longer try to shut down the thread when a TT service terminates)
          -JobTracker and TaskTracker have basic ping() operations

          Show
          steve_l added a comment - -The ASF License is granted here because management have given me a green light for this -this is against SVN_HEAD, with changes to handle the shared cleanup queue (we no longer try to shut down the thread when a TT service terminates) -JobTracker and TaskTracker have basic ping() operations
          Hide
          Eric Yang added a comment -

          Resubmit patch to hudson, trunk test was broken by HADOOP-5409.

          Show
          Eric Yang added a comment - Resubmit patch to hudson, trunk test was broken by HADOOP-5409 .
          Hide
          steve_l added a comment -

          doesn't compile, need to include a patched RPC class

          compile-mapred-classes:
          [javac] Compiling 278 source files to /home/slo/Java/Apache/hadoop-core/build/classes
          [javac] /home/slo/Java/Apache/hadoop-core/src/mapred/org/apache/hadoop/mapred/TaskTracker.java:578: waitForProxy(java.lang.Class,long,java.net.InetSocketAddress,org.apache.hadoop.conf.Configuration,long) is not public in org.apache.hadoop.ipc.RPC; cannot be accessed from outside package
          [javac] RPC.waitForProxy(InterTrackerProtocol.class,
          [javac] ^
          [javac] Note: Some input files use or override a deprecated API.
          [javac] Note: Recompile with -Xlint:deprecation for details.
          [javac] 1 error

          Show
          steve_l added a comment - doesn't compile, need to include a patched RPC class compile-mapred-classes: [javac] Compiling 278 source files to /home/slo/Java/Apache/hadoop-core/build/classes [javac] /home/slo/Java/Apache/hadoop-core/src/mapred/org/apache/hadoop/mapred/TaskTracker.java:578: waitForProxy(java.lang.Class,long,java.net.InetSocketAddress,org.apache.hadoop.conf.Configuration,long) is not public in org.apache.hadoop.ipc.RPC; cannot be accessed from outside package [javac] RPC.waitForProxy(InterTrackerProtocol.class, [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] 1 error
          Hide
          Konstantin Shvachko added a comment -

          I see this committed to the trunk. Did you intend it?

          Show
          Konstantin Shvachko added a comment - I see this committed to the trunk. Did you intend it?
          Hide
          Chris Douglas added a comment -

          Surely the intent was to commit to the 3628 branch, as discussed on the dev list. Either way, it must be reverted (and has been).

          Show
          Chris Douglas added a comment - Surely the intent was to commit to the 3628 branch, as discussed on the dev list. Either way, it must be reverted (and has been).
          Hide
          steve_l added a comment -

          sorry, this was not meant to go to trunk -something must have got confused between the command line and smartsvn. thanks for rollling this back; I was not trying to break anything.

          Show
          steve_l added a comment - sorry, this was not meant to go to trunk -something must have got confused between the command line and smartsvn. thanks for rollling this back; I was not trying to break anything.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk #778 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/778/)
          Revert
          this is all the files changed to add lifecycle to the Hadoop services, new tests, and patches to many existing tests to shut down the mini clusters that have been given a consistent java.io.Closeable interface, along with static utility methods to close them

          Show
          Hudson added a comment - Integrated in Hadoop-trunk #778 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/778/ ) Revert this is all the files changed to add lifecycle to the Hadoop services, new tests, and patches to many existing tests to shut down the mini clusters that have been given a consistent java.io.Closeable interface, along with static utility methods to close them
          Hide
          steve_l added a comment -

          For anyone wondering what the status of this is, I stopped synchronizing my code just before apachecon, to get some things working (and an RPM out the door), and took a vacation. Now I'm catching up -it turns out the easiest approach is to diff the state between my branch and the original code and the time of the branch, then semi-manually apply those changes to SVN head of trunk, first the filesystem then the job tracker.

          Show
          steve_l added a comment - For anyone wondering what the status of this is, I stopped synchronizing my code just before apachecon, to get some things working (and an RPM out the door), and took a vacation. Now I'm catching up -it turns out the easiest approach is to diff the state between my branch and the original code and the time of the branch, then semi-manually apply those changes to SVN head of trunk, first the filesystem then the job tracker.
          Hide
          steve_l added a comment -

          There is a new branch for this HADOOP-3628-2, which is trunk with the lifecycle on top, and I've stripped down the patches to the least amount of changes needed. Now running the tests to see if any more break on my machine than trunk does;

          Show
          steve_l added a comment - There is a new branch for this HADOOP-3628 -2, which is trunk with the lifecycle on top, and I've stripped down the patches to the least amount of changes needed. Now running the tests to see if any more break on my machine than trunk does;
          Hide
          steve_l added a comment -

          This is the lifecycle code running against SVN_HEAD, contains a couple of smaller patches I need to tease out.

          Show
          steve_l added a comment - This is the lifecycle code running against SVN_HEAD, contains a couple of smaller patches I need to tease out.
          Hide
          steve_l added a comment -

          I should add that the ping() method is now protected in Service; which allows subclasses access to the operation, but means that this lifecycle stuff isnt trying to provide any stable public ping/health test. subclasses can expose and use it, but there's an implicit "this may change" message.

          Show
          steve_l added a comment - I should add that the ping() method is now protected in Service; which allows subclasses access to the operation, but means that this lifecycle stuff isnt trying to provide any stable public ping/health test. subclasses can expose and use it, but there's an implicit "this may change" message.
          Hide
          Chris Douglas added a comment -

          The patch needs to be regenerated for the project split. I hope the split will make developing, reviewing, and committing these changes easier, not harder.

          Show
          Chris Douglas added a comment - The patch needs to be regenerated for the project split. I hope the split will make developing, reviewing, and committing these changes easier, not harder.
          Hide
          steve_l added a comment -

          Hi Chris
          I've been on vacation, but yes, I know this will be work. something like

          -create the patch by diffing the branch with the relevant version of the old trunk
          -apply the relevant bits of the patch to each branch
          -handle the ongoing changes
          -test everything
          -create separate issues for each branch

          Splitting the project into 3 has probably made cross-section patches trickier. You will know if you have a change to MR that it wont impact core or dfs, but not the other way around.

          Show
          steve_l added a comment - Hi Chris I've been on vacation, but yes, I know this will be work. something like -create the patch by diffing the branch with the relevant version of the old trunk -apply the relevant bits of the patch to each branch -handle the ongoing changes -test everything -create separate issues for each branch Splitting the project into 3 has probably made cross-section patches trickier. You will know if you have a change to MR that it wont impact core or dfs, but not the other way around.
          Hide
          Allen Wittenauer added a comment -

          It would probably be good to concentrate on core/common/name-of-the-week, get it committed, then work on HDFS, get it committed, then work on MR, get it committed. This way some incremental progress can be made, giving everyone some experience with these interfaces, etc, without having to wait for the One Big Patch. Additionally the One Big Patch approach seems to be floundering; I have concerns it will never ever get committed.

          Show
          Allen Wittenauer added a comment - It would probably be good to concentrate on core/common/name-of-the-week, get it committed, then work on HDFS, get it committed, then work on MR, get it committed. This way some incremental progress can be made, giving everyone some experience with these interfaces, etc, without having to wait for the One Big Patch. Additionally the One Big Patch approach seems to be floundering; I have concerns it will never ever get committed.
          Hide
          Chris Douglas added a comment -

          Splitting the project into 3 has probably made cross-section patches trickier. You will know if you have a change to MR that it wont impact core or dfs, but not the other way around.

          That may recommend pushing common/mapred pieces in first, rather than trying to keep all three issues current. Common changes more slowly and is unlikely to conflict as you work on it, but the mapred/hdfs pieces will probably require more iteration.

          Show
          Chris Douglas added a comment - Splitting the project into 3 has probably made cross-section patches trickier. You will know if you have a change to MR that it wont impact core or dfs, but not the other way around. That may recommend pushing common/mapred pieces in first, rather than trying to keep all three issues current. Common changes more slowly and is unlikely to conflict as you work on it, but the mapred/hdfs pieces will probably require more iteration.
          Hide
          steve_l added a comment -

          I'm +1 on getting something in then patching the other bits, and evolving the original patch if needed in the process.

          Another way to look at the problem is : which bit of code matters most. I'd argue its DFS, as its the one with the ability to lose persistent data. adding code to MapRed could generate slower or wrong answers from work, which can lead to subtle data corruption, but its not the thing that sets off Allen's pager at night.

          Show
          steve_l added a comment - I'm +1 on getting something in then patching the other bits, and evolving the original patch if needed in the process. Another way to look at the problem is : which bit of code matters most. I'd argue its DFS, as its the one with the ability to lose persistent data. adding code to MapRed could generate slower or wrong answers from work, which can lead to subtle data corruption, but its not the thing that sets off Allen's pager at night.
          Hide
          steve_l added a comment -

          This is the diff against the final pre-branch version of hadoop. Don't try and apply this at home.

          Show
          steve_l added a comment - This is the diff against the final pre-branch version of hadoop. Don't try and apply this at home.
          Hide
          steve_l added a comment -

          I've been quiet on pushing out patches for this, because I'm not going to try getting them in to 0.21; once that branch has happened I will worry about getting the svn commit in. There's two reasons for this

          1. I'm busy using the branches with the lifecycle patched in, and learning from the experience.
          2. I'm worried about the problem of what happens if you try to stop a service while it is starting up.
            Some of the services do take a while to start, and you should have the right to interrupt something like a JobTracker and tell it to go away.

          This updated document

          1. Looks at the problem of stop-during-startup
          2. Looks at other Lifeycles out there
          3. removes all ping/liveness
          4. does suggest future areas of development
          Show
          steve_l added a comment - I've been quiet on pushing out patches for this, because I'm not going to try getting them in to 0.21; once that branch has happened I will worry about getting the svn commit in. There's two reasons for this I'm busy using the branches with the lifecycle patched in, and learning from the experience. I'm worried about the problem of what happens if you try to stop a service while it is starting up. Some of the services do take a while to start, and you should have the right to interrupt something like a JobTracker and tell it to go away. This updated document Looks at the problem of stop-during-startup Looks at other Lifeycles out there removes all ping/liveness does suggest future areas of development
          Hide
          Hong Tang added a comment -

          2. I'm worried about the problem of what happens if you try to stop a service while it is starting up. Some of the services do take a while to start, and you should have the right to interrupt something like a JobTracker and tell it to go away.

          Without look at the patch, I am a bit confusion by "... do take a while to start", do you mean the service will take a while in STARTING or in INITIALIZING? I think in general a component should do the minimum to set up itself to a consistent state in STARTING stage and defer length initialization in the INITIALIZING stage. And we would allow STARTING to run through uninterrupted but support aborting the initialization.

          Show
          Hong Tang added a comment - 2. I'm worried about the problem of what happens if you try to stop a service while it is starting up. Some of the services do take a while to start, and you should have the right to interrupt something like a JobTracker and tell it to go away. Without look at the patch, I am a bit confusion by "... do take a while to start", do you mean the service will take a while in STARTING or in INITIALIZING? I think in general a component should do the minimum to set up itself to a consistent state in STARTING stage and defer length initialization in the INITIALIZING stage. And we would allow STARTING to run through uninterrupted but support aborting the initialization.
          Hide
          steve_l added a comment -

          -I mean that the JT and spin until the filesystem is live, as part of its job tracker restart handling, and the TT will block inside initialize() }} at {{RPC.waitForProxy() until the JT is responding

          The HDFS nodes come up faster; the DN just initialises itself, opens the HTTP ports -the polling for the NN takes place in a separate loop.

          The namenode comes up and then makes sense of its logs before accepting callers.

          So right now, if you try and bring up a JT without a filesystem, it ends up hanging around, which blocks all the task trackers too. It's not obvious in a cluster provided that filesystem does come up, but I do worry that it makes the JT more vulnerable to a transient NN outage.

          What I can generate right now then is a situation where you try to bring up the MapReduce layer, without a functional filesystem, and everything just blocks. In the 0.21 codebase, that means your java threads block in the constructors. You can try and interrupt them, though we'd have to walk through every method called to make sure interrupts propagate up reliably (I'll have to do that anyway, I suspect). If you move the startup logic out of the constructor, it will still block, only now you have a reference to the object which can be good or bad. Good: you can call methods on it, bad, because it is still starting up. What I'd like is a way to reliably stop the JT and task tracker while they are waiting for the filesystem to come up. Which, as it happens every time, is something that we can test for.

          Show
          steve_l added a comment - -I mean that the JT and spin until the filesystem is live, as part of its job tracker restart handling, and the TT will block inside initialize() }} at {{RPC.waitForProxy() until the JT is responding The HDFS nodes come up faster; the DN just initialises itself, opens the HTTP ports -the polling for the NN takes place in a separate loop. The namenode comes up and then makes sense of its logs before accepting callers. So right now, if you try and bring up a JT without a filesystem, it ends up hanging around, which blocks all the task trackers too. It's not obvious in a cluster provided that filesystem does come up, but I do worry that it makes the JT more vulnerable to a transient NN outage. What I can generate right now then is a situation where you try to bring up the MapReduce layer, without a functional filesystem, and everything just blocks. In the 0.21 codebase, that means your java threads block in the constructors. You can try and interrupt them, though we'd have to walk through every method called to make sure interrupts propagate up reliably (I'll have to do that anyway, I suspect). If you move the startup logic out of the constructor, it will still block, only now you have a reference to the object which can be good or bad. Good: you can call methods on it, bad, because it is still starting up. What I'd like is a way to reliably stop the JT and task tracker while they are waiting for the filesystem to come up. Which, as it happens every time, is something that we can test for.
          Hide
          Stu Hood added a comment -

          Hey Steve, what is the status of this issue? Have you decided to move forward with the "separate issues per component" approach? Integrating lifecycle into the different daemons is very attractive, since as you pointed out, it allows for ZooKeeper management.

          Show
          Stu Hood added a comment - Hey Steve, what is the status of this issue? Have you decided to move forward with the "separate issues per component" approach? Integrating lifecycle into the different daemons is very attractive, since as you pointed out, it allows for ZooKeeper management.
          Hide
          steve_l added a comment -

          Stu.,

          I've not touched this code for a bit because it was working enough to show we could push out dynamically configured hadoop installations -now I've got sucked into the other half of the problem, asking for allocated real/virtual machines and creating valid configurations for the workers based on the master nodes' hostnames, pushing them out, etc, etc. Which is a good test case for all this dynamic stuff, but the other bits and their tests do take up a lot of time.

          I should put up a plan for doing this properly, something like

          • switch to Git to keep changes more isolated
          • write some tests to demonstrate the problems w/ JobTracker hanging if the filesystem isn't there
          • Come up with a solution that involved interrupting threads or similar
          • get the base changes into -common, then worry about hdfs and mapred as separate issue

          Thoughts?

          Show
          steve_l added a comment - Stu., I've not touched this code for a bit because it was working enough to show we could push out dynamically configured hadoop installations -now I've got sucked into the other half of the problem, asking for allocated real/virtual machines and creating valid configurations for the workers based on the master nodes' hostnames, pushing them out, etc, etc. Which is a good test case for all this dynamic stuff, but the other bits and their tests do take up a lot of time. I should put up a plan for doing this properly, something like switch to Git to keep changes more isolated write some tests to demonstrate the problems w/ JobTracker hanging if the filesystem isn't there Come up with a solution that involved interrupting threads or similar get the base changes into -common, then worry about hdfs and mapred as separate issue Thoughts?
          Hide
          Tom White added a comment -

          switch to Git to keep changes more isolated

          And make maintaining patches easier (using rebase). See http://www.apache.org/dev/git.html#workflow.

          get the base changes into -common, then worry about hdfs and mapred as separate issue

          +1 This should make the changes more isolated too, and make them easier to review.

          Show
          Tom White added a comment - switch to Git to keep changes more isolated And make maintaining patches easier (using rebase). See http://www.apache.org/dev/git.html#workflow . get the base changes into -common, then worry about hdfs and mapred as separate issue +1 This should make the changes more isolated too, and make them easier to review.
          Hide
          Steve Loughran added a comment -

          This is the patches for the -hdfs JAR to bring the NN and DN under the lifecycle patch supplied with the latest version of HADOOP-6194. I'm not submitting it as a patch as it will only build against that version; it's here for people to see. The tests all appear to pass on my machine.

          This patch has a few interesting features

          1. Does not start the service in the constructor, to make subclassing it safer -the secondary NN benefits from this
          2. Relies on the static factory methods to create and start instances of primary or secondary NNs.
          3. Adds more shutdown logic.

          The extra shutdown stuff could be separated off from the rest of the patch and applied first -it would make this patch smaller.

          Show
          Steve Loughran added a comment - This is the patches for the -hdfs JAR to bring the NN and DN under the lifecycle patch supplied with the latest version of HADOOP-6194 . I'm not submitting it as a patch as it will only build against that version; it's here for people to see. The tests all appear to pass on my machine. This patch has a few interesting features Does not start the service in the constructor, to make subclassing it safer -the secondary NN benefits from this Relies on the static factory methods to create and start instances of primary or secondary NNs. Adds more shutdown logic. The extra shutdown stuff could be separated off from the rest of the patch and applied first -it would make this patch smaller.
          Hide
          Steve Loughran added a comment -

          One test fails, TestFsck, on OS/X; I'm not sure if it is related to this patch or not. Thoughts?

          2010-04-28 22:14:36,824 INFO  security.Groups (Groups.java:<init>(60)) - Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
          2010-04-28 22:14:36,828 DEBUG security.UserGroupInformation (FSPermissionChecker.java:checkPermission(115)) - ACCESS CHECK: org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker@a01ec0b, doCheckOwner=false, ancestorAccess=null, parentAccess=null, access=null, subAccess=null
          2010-04-28 22:14:36,829 DEBUG security.UserGroupInformation (FSPermissionChecker.java:checkPermission(115)) - ACCESS CHECK: org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker@42fcb4f, doCheckOwner=false, ancestorAccess=null, parentAccess=null, access=READ_EXECUTE, subAccess=null
          2010-04-28 22:14:36,830 WARN  namenode.NameNode (NamenodeFsck.java:fsck(180)) - Fsck on path '/dfsck' FAILED
          org.apache.hadoop.security.AccessControlException: Permission denied: user=ProbablyNotARealUserName, access=READ_EXECUTE, inode="dfsck":slo:supergroup:rwx------
          	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:207)
          	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:142)
          	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4016)
          	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:3980)
          	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:2244)
          	at org.apache.hadoop.hdfs.server.namenode.NameNode.getListing(NameNode.java:937)
          	at org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:250)
          	at org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.fsck(NamenodeFsck.java:159)
          	at org.apache.hadoop.hdfs.server.namenode.FsckServlet$1.run(FsckServlet.java:65)
          	at java.security.AccessController.doPrivileged(Native Method)
          	at javax.security.auth.Subject.doAs(Subject.java:396)
          	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:738)
          	at org.apache.hadoop.hdfs.server.namenode.FsckServlet.doGet(FsckServlet.java:53)
          	at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
          	at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
          	at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
          	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166)
          	at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:760)
          	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
          	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
          	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
          	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
          	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
          	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
          	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
          	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
          	at org.mortbay.jetty.Server.handle(Server.java:326)
          	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
          	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923)
          	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547)
          	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
          	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
          	at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
          	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
          Permission denied: user=ProbablyNotARealUserName, access=READ_EXECUTE, inode="dfsck":slo:supergroup:rwx------
          
          
          Fsck on path '/dfsck' FAILED
          
          Show
          Steve Loughran added a comment - One test fails, TestFsck, on OS/X; I'm not sure if it is related to this patch or not. Thoughts? 2010-04-28 22:14:36,824 INFO security.Groups (Groups.java:<init>(60)) - Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000 2010-04-28 22:14:36,828 DEBUG security.UserGroupInformation (FSPermissionChecker.java:checkPermission(115)) - ACCESS CHECK: org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker@a01ec0b, doCheckOwner= false , ancestorAccess= null , parentAccess= null , access= null , subAccess= null 2010-04-28 22:14:36,829 DEBUG security.UserGroupInformation (FSPermissionChecker.java:checkPermission(115)) - ACCESS CHECK: org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker@42fcb4f, doCheckOwner= false , ancestorAccess= null , parentAccess= null , access=READ_EXECUTE, subAccess= null 2010-04-28 22:14:36,830 WARN namenode.NameNode (NamenodeFsck.java:fsck(180)) - Fsck on path '/dfsck' FAILED org.apache.hadoop.security.AccessControlException: Permission denied: user=ProbablyNotARealUserName, access=READ_EXECUTE, inode= "dfsck" :slo:supergroup:rwx------ at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:207) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:142) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4016) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:3980) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:2244) at org.apache.hadoop.hdfs.server.namenode.NameNode.getListing(NameNode.java:937) at org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:250) at org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.fsck(NamenodeFsck.java:159) at org.apache.hadoop.hdfs.server.namenode.FsckServlet$1.run(FsckServlet.java:65) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:738) at org.apache.hadoop.hdfs.server.namenode.FsckServlet.doGet(FsckServlet.java:53) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:760) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Permission denied: user=ProbablyNotARealUserName, access=READ_EXECUTE, inode= "dfsck" :slo:supergroup:rwx------ Fsck on path '/dfsck' FAILED
          Hide
          Steve Loughran added a comment -

          Superceded by YARN, though a lot of the work here needs to be pulled in there, which can be done on a case by case basis

          Show
          Steve Loughran added a comment - Superceded by YARN, though a lot of the work here needs to be pulled in there, which can be done on a case by case basis

            People

            • Assignee:
              Steve Loughran
              Reporter:
              Steve Loughran
            • Votes:
              3 Vote for this issue
              Watchers:
              32 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development