Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-435

Add orthogonal fault injection mechanism/framework

    Details

    • Type: Test Test
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: test
    • Labels:
      None
    • Release Note:
      New fault injection framework AspectJ improves testing. See Jira item for users' guide.

      Description

      It'd be great to have a fault injection mechanism for Hadoop.

      Having such solution in place will allow to increase test coverage of error handling and recovery mechanisms, reduce reproduction time and increase the reproduction rate of the problems.

      Ideally, the system has to be orthogonal to the current code and test base. E.g. faults have to be injected at build time and would have to be configurable, e.g. all faults could be turned off, or only some of them would be allowed to happen. Also, fault injection has to be separated from production build.

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Hide
          Konstantin Boudnik added a comment -

          I would like to propose the following initial requirements for Fault Injection (FI) solution for Hadoop:

          1. Has to be orthogonal to existing source code and test base: no need of direct code or tests modifications, preferably based on a cross-cut model
          2. Fully detachable: insert/remove faults from the system without hassle: a separate build target has to be set to introduce faults in place with a single command. Removal should be equally easy.
          3. High level of fault abstractions: implementation of faults' logic has to be done in high-level language, e.g. Java
          4. Need to reuse existing unit/functional tests if possible
          5. Fine grained configuration at runtime: fully deterministic or random injection of the faults should be configured at runtime through a configuration file or a set of system properties - no source code modifications or re-compilation required.
          6. If an off-shelf solution is used it's better comes under Apache's compatible open-source license
          Show
          Konstantin Boudnik added a comment - I would like to propose the following initial requirements for Fault Injection (FI) solution for Hadoop: Has to be orthogonal to existing source code and test base: no need of direct code or tests modifications, preferably based on a cross-cut model Fully detachable: insert/remove faults from the system without hassle: a separate build target has to be set to introduce faults in place with a single command. Removal should be equally easy. High level of fault abstractions: implementation of faults' logic has to be done in high-level language, e.g. Java Need to reuse existing unit/functional tests if possible Fine grained configuration at runtime: fully deterministic or random injection of the faults should be configured at runtime through a configuration file or a set of system properties - no source code modifications or re-compilation required. If an off-shelf solution is used it's better comes under Apache's compatible open-source license
          Hide
          Konstantin Boudnik added a comment - - edited

          Here's an overall proposition of the framework layout:

          • AspectJ 1.6 should be used as the base framework
          • additional set of classes needs to be developed to control and configure injection of the faults at the runtime. In the first version of the framework, I'd recommend to go with with randomly (in terms of their happening, not their location in the application code) injected faults
          • randomization level might be configured through system properties from the command line or set in a separate configuration file
          • to completely turn off faults injection for a class the probability level has to be set to 0% ('zero'); setting to 100% will achieve the opposite effect
          • build.xml has to be extended with a new target ('injectfaults') to weave needed aspects in place after the normal compilation of Java classes is done; JUnit targets will have to be modified to pass new probability configuration parameters into spawn JVM
          • aspects' source code will be place under test/src/aop; package structure will mimic the original one of Hadoop. Say an aspect for FSDataset has to belong to org.apache.hadoop.hdfs.server.datanode

          Some examples of new build/test execution interface:

          To weave (build-in) aspects in place:

          • % ant injectfaults

          To execute HDFS tests (turn everything off, but BlockReceiver faults, which set at 10% level):

          • % ant run-test-hdfs -DallFaultProbability=0 -DBlockReceiverFaultProbability=10
          Show
          Konstantin Boudnik added a comment - - edited Here's an overall proposition of the framework layout: AspectJ 1.6 should be used as the base framework additional set of classes needs to be developed to control and configure injection of the faults at the runtime. In the first version of the framework, I'd recommend to go with with randomly (in terms of their happening, not their location in the application code) injected faults randomization level might be configured through system properties from the command line or set in a separate configuration file to completely turn off faults injection for a class the probability level has to be set to 0% ('zero'); setting to 100% will achieve the opposite effect build.xml has to be extended with a new target ('injectfaults') to weave needed aspects in place after the normal compilation of Java classes is done; JUnit targets will have to be modified to pass new probability configuration parameters into spawn JVM aspects' source code will be place under test/src/aop; package structure will mimic the original one of Hadoop. Say an aspect for FSDataset has to belong to org.apache.hadoop.hdfs.server.datanode Some examples of new build/test execution interface: To weave (build-in) aspects in place: % ant injectfaults To execute HDFS tests (turn everything off, but BlockReceiver faults, which set at 10% level): % ant run-test-hdfs -DallFaultProbability=0 -DBlockReceiverFaultProbability=10
          Hide
          Konstantin Boudnik added a comment -

          My patch is pretty much ready and requires a couple of libraries to be added to the Hadoop project. These libraries aren't associated with any of Apache's projects: they are under Eclipse Software License and are distributed from their website.

          I'm not sure what is the 'rule of thumb' to add the libraries to ivy configuration for Hadoop? Or shall they be added statically, e.g. into SVN repository? I assume that the latter is a bad idea generally, which leaves us with the former option.

          Can any of the watchers comment on this, please?

          Show
          Konstantin Boudnik added a comment - My patch is pretty much ready and requires a couple of libraries to be added to the Hadoop project. These libraries aren't associated with any of Apache's projects: they are under Eclipse Software License and are distributed from their website. I'm not sure what is the 'rule of thumb' to add the libraries to ivy configuration for Hadoop? Or shall they be added statically, e.g. into SVN repository? I assume that the latter is a bad idea generally, which leaves us with the former option. Can any of the watchers comment on this, please?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          >% ant run-test-hdfs -DallFaultProbability=0 -DBlockReceiverFaultProbability=10

          The naming convention may be better to have something like fault.probability.*, fault.probability.datanode.BlockReceiver, etc.

          Show
          Tsz Wo Nicholas Sze added a comment - >% ant run-test-hdfs -DallFaultProbability=0 -DBlockReceiverFaultProbability=10 The naming convention may be better to have something like fault.probability.*, fault.probability.datanode.BlockReceiver, etc.
          Hide
          Konstantin Boudnik added a comment -

          Thanks for the suggestion, Nicholas. I like your way (the prefixing with
          fault.probability) better and I'm putting it into the patch right away.

          As for suffix of the name it'd be completely up to the aspects developers to
          name it. However, I agree that datanode.BlockReceiver would more mnemonically
          appealing.

          Show
          Konstantin Boudnik added a comment - Thanks for the suggestion, Nicholas. I like your way (the prefixing with fault.probability) better and I'm putting it into the patch right away. As for suffix of the name it'd be completely up to the aspects developers to name it. However, I agree that datanode.BlockReceiver would more mnemonically appealing.
          Hide
          Konstantin Boudnik added a comment -

          It seems that none of current Maven repos have AspectJ1.6.4 in place. The latest version available is 1.5.4, which won't work because Hadoop is Java6 project.

          Any idea how to add a latest version of a library to a Maven repository?

          Show
          Konstantin Boudnik added a comment - It seems that none of current Maven repos have AspectJ1.6.4 in place. The latest version available is 1.5.4, which won't work because Hadoop is Java6 project. Any idea how to add a latest version of a library to a Maven repository?
          Hide
          Giridharan Kesavan added a comment -

          we can file a jira with codehaus with the location of the aspectj jar file and its pom , so they can help us in uploading the latest version of aspectj to the mvn repository.

          BTW I see different aspectj jar file in here .. some of them are at version-1.5.4 and some are at version 1.6.4
          http://www.mvnrepository.com/search.html?query=aspectj

          Could you please mention the name of the aspectj jar that you are lookin for?

          Show
          Giridharan Kesavan added a comment - we can file a jira with codehaus with the location of the aspectj jar file and its pom , so they can help us in uploading the latest version of aspectj to the mvn repository. BTW I see different aspectj jar file in here .. some of them are at version-1.5.4 and some are at version 1.6.4 http://www.mvnrepository.com/search.html?query=aspectj Could you please mention the name of the aspectj jar that you are lookin for?
          Hide
          Konstantin Boudnik added a comment -

          Great! Thanks for the pointer - I saw only 1.5.4 in there and somehow missed the latest version. It worked, so I will publish the patch shortly.

          Show
          Konstantin Boudnik added a comment - Great! Thanks for the pointer - I saw only 1.5.4 in there and somehow missed the latest version. It worked, so I will publish the patch shortly.
          Hide
          Konstantin Boudnik added a comment -

          Fault Injection Framework How to and faults development guide

          Show
          Konstantin Boudnik added a comment - Fault Injection Framework How to and faults development guide
          Hide
          dhruba borthakur added a comment -

          Very cool stuff! And the guide is very helpful. I have some questions from the user gide.

          pointcut callReceivePacket() :
          call (* OutputStream.write(..))
          && withincode (* BlockReceiver.receivePacket(..))
          // to further limit the application of this aspect a very narrow 'target' can be used as follows
          // && target(DataOutputStream)
          && !within(BlockReceiverAspects +);

          Can you pl explain the above line in detail, what it means, etc. Things like "pointcut", "withincode", are these aspectJ constructs? what is the intention of the above line? Thanks.

          Show
          dhruba borthakur added a comment - Very cool stuff! And the guide is very helpful. I have some questions from the user gide. pointcut callReceivePacket() : call (* OutputStream.write(..)) && withincode (* BlockReceiver.receivePacket(..)) // to further limit the application of this aspect a very narrow 'target' can be used as follows // && target(DataOutputStream) && !within(BlockReceiverAspects +); Can you pl explain the above line in detail, what it means, etc. Things like "pointcut", "withincode", are these aspectJ constructs? what is the intention of the above line? Thanks.
          Hide
          Konstantin Boudnik added a comment -

          Document is updated with additional explanations for the meaning of an aspect's specifics such as pointcut and all.
          Thank you for the review, Dhruba!

          Show
          Konstantin Boudnik added a comment - Document is updated with additional explanations for the meaning of an aspect's specifics such as pointcut and all. Thank you for the review, Dhruba!
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Yes, the guide is very useful for aop test development. We should check in the doc.

          Dhruba, where should we put the doc? Any idea?

          Show
          Tsz Wo Nicholas Sze added a comment - Yes, the guide is very useful for aop test development. We should check in the doc. Dhruba, where should we put the doc? Any idea?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I have created some aop tests in HDFS-483 for testing pipeline.

          Show
          Tsz Wo Nicholas Sze added a comment - I have created some aop tests in HDFS-483 for testing pipeline.
          Hide
          Konstantin Boudnik added a comment -

          I'd imagine that we can check in test related documents and examples into src/docs/test so that would be generated into Hdfs' documentation by Forrest.

          Also, it'd be great to have a set of examples to demonstrate different techniques of fault injection development for tests. Nicholas' HDFS-483 is a great start!

          Show
          Konstantin Boudnik added a comment - I'd imagine that we can check in test related documents and examples into src/docs/test so that would be generated into Hdfs' documentation by Forrest. Also, it'd be great to have a set of examples to demonstrate different techniques of fault injection development for tests. Nicholas' HDFS-483 is a great start!
          Hide
          dhruba borthakur added a comment -

          > Dhruba, where should we put the doc? Any idea?

          Docs typically go into src/docs. But we want the doc in an editable and open format. The pdf format is not editable. One option is to convert it into forrest xml and check it into src/docs/src/documentation/content/xdocs.

          Show
          dhruba borthakur added a comment - > Dhruba, where should we put the doc? Any idea? Docs typically go into src/docs. But we want the doc in an editable and open format. The pdf format is not editable. One option is to convert it into forrest xml and check it into src/docs/src/documentation/content/xdocs.
          Hide
          Konstantin Boudnik added a comment -

          The PDF format has been chosen for a convenience of future readers. I think your suggestion is valid - I'll open a separate JIRA for this and will take care about the conversion.

          Show
          Konstantin Boudnik added a comment - The PDF format has been chosen for a convenience of future readers. I think your suggestion is valid - I'll open a separate JIRA for this and will take care about the conversion.
          Hide
          Nigel Daley added a comment -

          Yes, the doc should be converted to forrest and put in src/docs/src/documentation/content/xdocs

          Show
          Nigel Daley added a comment - Yes, the doc should be converted to forrest and put in src/docs/src/documentation/content/xdocs
          Hide
          Konstantin Boudnik added a comment -

          It has been already done - please see HDFS-498 subtask of this JIRA

          Show
          Konstantin Boudnik added a comment - It has been already done - please see HDFS-498 subtask of this JIRA
          Hide
          Konstantin Boudnik added a comment -

          I believe that all aspects of this JIRA were addressed by an appropriate subtasks. Thus, I propose to resolve this umbrella JIRA as 'Fixed'

          Show
          Konstantin Boudnik added a comment - I believe that all aspects of this JIRA were addressed by an appropriate subtasks. Thus, I propose to resolve this umbrella JIRA as 'Fixed'
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > ... I propose to resolve this umbrella JIRA as 'Fixed'
          +1

          Show
          Tsz Wo Nicholas Sze added a comment - > ... I propose to resolve this umbrella JIRA as 'Fixed' +1
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Closing this since all subtasks are fixed.

          Show
          Tsz Wo Nicholas Sze added a comment - Closing this since all subtasks are fixed.
          Hide
          Robert Chansler added a comment -

          Editorial pass over all release notes prior to publication of 0.21.

          Show
          Robert Chansler added a comment - Editorial pass over all release notes prior to publication of 0.21.

            People

            • Assignee:
              Konstantin Boudnik
              Reporter:
              Konstantin Boudnik
            • Votes:
              2 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development