Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Tags:
      herriot

      Description

      Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.


      The proposal is a junit-based, large-scale test framework which would run against real clusters.

      There are several pieces we need to achieve this goal:

      1. A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
      2. Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should not be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.

      Related note: we should break up our tests into at least 3 categories:

      1. src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
      2. src/test/integration -> Current junit tests with Mini* clusters etc.
      3. src/test/system -> HADOOP-6332 and it's children
      1. 6332_v1.patch
        20 kB
        Sharad Agarwal
      2. 6332_v2.patch
        24 kB
        Sharad Agarwal
      3. HADOOP-6332-MR.patch
        11 kB
        Konstantin Boudnik
      4. HADOOP-6332.patch
        13 kB
        Konstantin Boudnik
      5. HADOOP-6332.patch
        13 kB
        Konstantin Boudnik
      6. HADOOP-6332-MR.patch
        13 kB
        Konstantin Boudnik
      7. 6332.patch
        86 kB
        Sharad Agarwal
      8. 6332.patch
        183 kB
        Konstantin Boudnik
      9. 6332.patch
        183 kB
        Konstantin Boudnik
      10. 6332-phase2.patch
        358 kB
        Konstantin Boudnik
      11. 6332-phase2.fix1.patch
        2 kB
        Konstantin Boudnik
      12. 6332-phase2.fix2.patch
        1 kB
        Konstantin Boudnik
      13. HADOOP-6332.0.22.patch
        256 kB
        Konstantin Boudnik
      14. HADOOP-6332.0.22.patch
        253 kB
        Konstantin Boudnik
      15. HADOOP-6332.0.22.patch
        253 kB
        Konstantin Boudnik
      16. HADOOP-6332.0.22.patch
        253 kB
        Konstantin Boudnik
      17. HADOOP-6332.0.22.patch
        254 kB
        Konstantin Boudnik
      18. HADOOP-6332.0.22.patch
        249 kB
        Konstantin Boudnik
      19. HADOOP-6332.0.22.patch
        267 kB
        Konstantin Boudnik
      20. HADOOP-6332.0.22.patch
        267 kB
        Konstantin Boudnik

        Issue Links

          Activity

          Hide
          Konstantin Boudnik added a comment -

          Adding a reference to a scripting framework discussed in past (HADOOP-6248)

          Show
          Konstantin Boudnik added a comment - Adding a reference to a scripting framework discussed in past ( HADOOP-6248 )
          Hide
          Konstantin Boudnik added a comment -

          All subtasks are completed and I'm resolving this as fixed. HDFS/MR specific parts of the framework are tracked by HDFS-1134 and MAPREDUCE-1774 respectively.

          Show
          Konstantin Boudnik added a comment - All subtasks are completed and I'm resolving this as fixed. HDFS/MR specific parts of the framework are tracked by HDFS-1134 and MAPREDUCE-1774 respectively.
          Hide
          Doug Cutting added a comment -

          > I agree that Common shouldn't be treated as the hadoop.util project, but this seems correct.

          Okay, sounds reasonable to me.

          Show
          Doug Cutting added a comment - > I agree that Common shouldn't be treated as the hadoop.util project, but this seems correct. Okay, sounds reasonable to me.
          Hide
          Chris Douglas added a comment -

          I had a conversation with Cos and learned that I completely misapprehended Herriot's scope. As a subproject, its purpose would be to pull down Hadoop jars and instrument them. While it would be possible to structure it this way, adding a target to produce instrumented jars is far more coherent and maintainable than maintaining a parallel build system. I agree that Common shouldn't be treated as the hadoop.util project, but this seems correct.

          Show
          Chris Douglas added a comment - I had a conversation with Cos and learned that I completely misapprehended Herriot's scope. As a subproject, its purpose would be to pull down Hadoop jars and instrument them. While it would be possible to structure it this way, adding a target to produce instrumented jars is far more coherent and maintainable than maintaining a parallel build system. I agree that Common shouldn't be treated as the hadoop.util project, but this seems correct.
          Hide
          Konstantin Boudnik added a comment -

          Do we expect to add system tests specific to Common?

          Hmm, it depends. I'd say org/apache/hadoop/fs is the good candidate for this sort of tests.

          Show
          Konstantin Boudnik added a comment - Do we expect to add system tests specific to Common? Hmm, it depends. I'd say org/apache/hadoop/fs is the good candidate for this sort of tests.
          Hide
          Doug Cutting added a comment -

          > But this is so until we don't have any system tests specific for Common.

          Do we expect to add system tests specific to Common?

          Show
          Doug Cutting added a comment - > But this is so until we don't have any system tests specific for Common. Do we expect to add system tests specific to Common?
          Hide
          Konstantin Boudnik added a comment -

          Doug, Herriot concept is apparently very useful beyond HDFS/MR projects. However, the concrete implementation is very specific for these components. As Sharad had mentioned above this is Hadoop test code.

          At the moment it seems technically possible to separate Common's part of Herriot from Common itself. But this is so until we don't have any system tests specific for Common.

          Show
          Konstantin Boudnik added a comment - Doug, Herriot concept is apparently very useful beyond HDFS/MR projects. However, the concrete implementation is very specific for these components. As Sharad had mentioned above this is Hadoop test code. At the moment it seems technically possible to separate Common's part of Herriot from Common itself. But this is so until we don't have any system tests specific for Common.
          Hide
          Doug Cutting added a comment -

          > MR and HDFS today share lot of things from common like Configuration etc. Similarly Herriot's common functionality is abstracted and put in common.

          Our long-term goal should probably be to diminish Common as a grab-bag of shared bits of code for MR and HDFS. Rather, it would be better if the shared bits were separate projects or subprojects that are independently useful. So, if we think Herriot is of use beyond HDFS and MR then perhaps it ought to be a separate project. Similarly, long-term, RPC and Configuration might eventually become artifacts that other projects can use independently, rather than as a part of Common.

          Show
          Doug Cutting added a comment - > MR and HDFS today share lot of things from common like Configuration etc. Similarly Herriot's common functionality is abstracted and put in common. Our long-term goal should probably be to diminish Common as a grab-bag of shared bits of code for MR and HDFS. Rather, it would be better if the shared bits were separate projects or subprojects that are independently useful. So, if we think Herriot is of use beyond HDFS and MR then perhaps it ought to be a separate project. Similarly, long-term, RPC and Configuration might eventually become artifacts that other projects can use independently, rather than as a part of Common.
          Hide
          Sharad Agarwal added a comment -

          Herriot is the test code. Shouldn't test code stay with the project which it is intended for? MR and HDFS today share lot of things from common like Configuration etc. Similarly Herriot's common functionality is abstracted and put in common.

          If cluttering of build files and src tree is a concern, it can be a contrib project.

          Show
          Sharad Agarwal added a comment - Herriot is the test code. Shouldn't test code stay with the project which it is intended for? MR and HDFS today share lot of things from common like Configuration etc. Similarly Herriot's common functionality is abstracted and put in common. If cluttering of build files and src tree is a concern, it can be a contrib project.
          Hide
          Konstantin Boudnik added a comment -

          What is required to compile the aspects? If source is not required, can the AOP code live in the Herriot project and be compiled against the jars published by maven?

          Thanks for reminder - it is totally disappeared from my minds... Either the source code of the 'target' classes is needed for successful weaving or (as you suggesting) we'll have to instrument target jars pulled down from Maven. Which is ... well, suboptimal.

          Show
          Konstantin Boudnik added a comment - What is required to compile the aspects? If source is not required, can the AOP code live in the Herriot project and be compiled against the jars published by maven? Thanks for reminder - it is totally disappeared from my minds... Either the source code of the 'target' classes is needed for successful weaving or (as you suggesting) we'll have to instrument target jars pulled down from Maven. Which is ... well, suboptimal.
          Hide
          Chris Douglas added a comment -

          all visible changes in the bulld system will be the same + a lot of stuff from src/test/aop/build/aop.xml will have to be brought into the Common, HDFS, and MR builds anyway.

          we'll need to have a source code dependency on Hadoop's subprojects in the framework development time to make sure the aspects are binding right, etc

          This is why I'm asking about packaging. Building (and supporting) artifacts for Herriot in Common, HDFS, and MapReduce as part of their normal compile is sub-optimal. What is required to compile the aspects? If source is not required, can the AOP code live in the Herriot project and be compiled against the jars published by maven?

          Show
          Chris Douglas added a comment - all visible changes in the bulld system will be the same + a lot of stuff from src/test/aop/build/aop.xml will have to be brought into the Common, HDFS, and MR builds anyway. we'll need to have a source code dependency on Hadoop's subprojects in the framework development time to make sure the aspects are binding right, etc This is why I'm asking about packaging. Building (and supporting) artifacts for Herriot in Common, HDFS, and MapReduce as part of their normal compile is sub-optimal. What is required to compile the aspects? If source is not required, can the AOP code live in the Herriot project and be compiled against the jars published by maven?
          Hide
          Konstantin Boudnik added a comment -

          Right, agree. Makes sense. Are we going to fully isolate MR from Common? I.e. the two won't have even jar dependencies? Cause this is exactly how MR (HDFS) parts of test framework depends on Common part of the framework - via a jar dependency.

          If you suggest to cut this off then we'll have to introduce another one from the test framework's artifact instead. it doesn't appear very natural to me in case of a software system and its embed test framework, but it can done of course.

          Show
          Konstantin Boudnik added a comment - Right, agree. Makes sense. Are we going to fully isolate MR from Common? I.e. the two won't have even jar dependencies? Cause this is exactly how MR (HDFS) parts of test framework depends on Common part of the framework - via a jar dependency. If you suggest to cut this off then we'll have to introduce another one from the test framework's artifact instead. it doesn't appear very natural to me in case of a software system and its embed test framework, but it can done of course.
          Hide
          Doug Cutting added a comment -

          > And I really don't see any advantage of the separation

          If the long-term intention is still to split HDFS and Mapreduce into separate projects, then we should reduce their interdependencies, i.e. reduce what's in Common rather than add more things into Common.

          Show
          Doug Cutting added a comment - > And I really don't see any advantage of the separation If the long-term intention is still to split HDFS and Mapreduce into separate projects, then we should reduce their interdependencies, i.e. reduce what's in Common rather than add more things into Common.
          Hide
          Konstantin Boudnik added a comment -

          I'm not saying it is impossible to do as a separate project. Packaging problem isn't an issue. In fact, current approach will publish instrumented artifacts separately too.

          Now, to weave aspects one doesn't need to have source code available at the build time: compiled aspects should be sufficient. However, keeping the framework out of the Hadoop's source tree has two fold problem:

          • all visible changes in the bulld system will be the same + a lot of stuff from src/test/aop/build/aop.xml will have to be brought into the Common, HDFS, and MR builds anyway.
          • we'll need to have a source code dependency on Hadoop's subprojects in the framework development time to make sure the aspects are binding right, etc.

          These are disadvantages. And I really don't see any advantage of the separation besides of reducing the number of source files under src/test/system.

          Also, please keep in mind that this test framework is Hadoop specific so it seems logical to keep them together.

          Show
          Konstantin Boudnik added a comment - I'm not saying it is impossible to do as a separate project. Packaging problem isn't an issue. In fact, current approach will publish instrumented artifacts separately too. Now, to weave aspects one doesn't need to have source code available at the build time: compiled aspects should be sufficient. However, keeping the framework out of the Hadoop's source tree has two fold problem: all visible changes in the bulld system will be the same + a lot of stuff from src/test/aop/build/aop.xml will have to be brought into the Common, HDFS, and MR builds anyway. we'll need to have a source code dependency on Hadoop's subprojects in the framework development time to make sure the aspects are binding right, etc. These are disadvantages. And I really don't see any advantage of the separation besides of reducing the number of source files under src/test/system . Also, please keep in mind that this test framework is Hadoop specific so it seems logical to keep them together.
          Hide
          Chris Douglas added a comment -

          Is it a packaging problem? As the source is available through maven (HADOOP-6635, HDFS-1047, MAPREDUCE-1613), if Hudson published snapshots, would that be sufficient?

          That it doesn't affect the production code seems to support the argument that Herriot should be a subproject...

          Show
          Chris Douglas added a comment - Is it a packaging problem? As the source is available through maven ( HADOOP-6635 , HDFS-1047 , MAPREDUCE-1613 ), if Hudson published snapshots, would that be sufficient? That it doesn't affect the production code seems to support the argument that Herriot should be a subproject...
          Hide
          Konstantin Boudnik added a comment -

          The main reason this has been done as a part of test infrastructure for Common (Hdfs, MR are coming) is that the framework is non-invasive and doesn't have any footprint in the production code of Hadoop. However, system tests need more functionality than a regular public API provides. To achieve this we had to use AOP. For the very least, compiled aspects have to be provided and then woven into Hadoop's classes. Framework part (aspects and all) might be kept separate from the main code tree. But at any rate this means changes in the build process. And it also will add a lot of complexity to the framework development/maintenance.

          Show
          Konstantin Boudnik added a comment - The main reason this has been done as a part of test infrastructure for Common (Hdfs, MR are coming) is that the framework is non-invasive and doesn't have any footprint in the production code of Hadoop. However, system tests need more functionality than a regular public API provides. To achieve this we had to use AOP. For the very least, compiled aspects have to be provided and then woven into Hadoop's classes. Framework part (aspects and all) might be kept separate from the main code tree. But at any rate this means changes in the build process. And it also will add a lot of complexity to the framework development/maintenance.
          Hide
          Doug Cutting added a comment -

          Should we really be adding this to Common, or might this be better as a new Herriot subproject?

          Show
          Doug Cutting added a comment - Should we really be adding this to Common, or might this be better as a new Herriot subproject?
          Hide
          Konstantin Boudnik added a comment -

          Have spoken with Sharad off-line and his suggestion is to change Maven id for the framework artifacts to hadoop-core-system-test. I'll open a sub-task for it and do the patch.

          Show
          Konstantin Boudnik added a comment - Have spoken with Sharad off-line and his suggestion is to change Maven id for the framework artifacts to hadoop-core-system-test . I'll open a sub-task for it and do the patch.
          Hide
          Konstantin Boudnik added a comment -

          One relatively small issue I want to get an input on is about publishing the artifacts of the test framework to the Maven's repo.

          I am creating artifacts with hadoop-core-system id right now. Does it sound like a good choice of the name? Anyone has any comments or a better suggestion?

          Show
          Konstantin Boudnik added a comment - One relatively small issue I want to get an input on is about publishing the artifacts of the test framework to the Maven's repo. I am creating artifacts with hadoop-core-system id right now. Does it sound like a good choice of the name? Anyone has any comments or a better suggestion?
          Hide
          Konstantin Boudnik added a comment -

          I have committed it to the trunk and 0.21 branch. Have ran all tests locally once more. All seems ok.

          Show
          Konstantin Boudnik added a comment - I have committed it to the trunk and 0.21 branch. Have ran all tests locally once more. All seems ok.
          Hide
          Konstantin Boudnik added a comment -

          I'll wait for til tomorrow in case someone has more comments and will commit it.

          Show
          Konstantin Boudnik added a comment - I'll wait for til tomorrow in case someone has more comments and will commit it.
          Hide
          Konstantin Boudnik added a comment -

          The JIRA should clearly target 0.21 and above. My earlier change was confusing.

          Show
          Konstantin Boudnik added a comment - The JIRA should clearly target 0.21 and above. My earlier change was confusing.
          Hide
          Konstantin Boudnik added a comment -

          I made the move from 0.21 to 0.22 to emphasize that the working is getting done on trunk. All patches should be applicable to 0.21 as well. Sorry for the confusion. I have just checked - the patch is applicable for 0.21 and will be committed in both 0.21 and trunk. I'll fix the JIRA's target.

          y20 is the Yahoo! internal release of Hadoop 0.20 where the initial work on this framework has been performed. The original framework patches were published against that source code hence the forward port work.

          Herriot is the 'working' name of the framework. It came after James Herriot, a veterinarian

          Show
          Konstantin Boudnik added a comment - I made the move from 0.21 to 0.22 to emphasize that the working is getting done on trunk. All patches should be applicable to 0.21 as well. Sorry for the confusion. I have just checked - the patch is applicable for 0.21 and will be committed in both 0.21 and trunk. I'll fix the JIRA's target. y20 is the Yahoo! internal release of Hadoop 0.20 where the initial work on this framework has been performed. The original framework patches were published against that source code hence the forward port work. Herriot is the 'working' name of the framework. It came after James Herriot, a veterinarian
          Hide
          Stephen Watt added a comment -

          Hi Cos/Sharad

          I noticed this JIRA also got moved from targeting 0.21 to 0.22. Can you elaborate on that decision ? I presume that is why the patches are targeting the trunk.

          What is y20 security and Herriot ?

          Show
          Stephen Watt added a comment - Hi Cos/Sharad I noticed this JIRA also got moved from targeting 0.21 to 0.22. Can you elaborate on that decision ? I presume that is why the patches are targeting the trunk. What is y20 security and Herriot ?
          Hide
          Sharad Agarwal added a comment -

          +1

          Show
          Sharad Agarwal added a comment - +1
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12444343/HADOOP-6332.0.22.patch
          against trunk revision 941662.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 48 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings).

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/517/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/517/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/517/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/517/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/517/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12444343/HADOOP-6332.0.22.patch against trunk revision 941662. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 48 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/517/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/517/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/517/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/517/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/517/console This message is automatically generated.
          Hide
          Konstantin Boudnik added a comment -

          Rechecking the patch once more.

          Show
          Konstantin Boudnik added a comment - Rechecking the patch once more.
          Hide
          Konstantin Boudnik added a comment -

          Has submitted wrong file previously. Correcting. The previous comment is valid though.

          Show
          Konstantin Boudnik added a comment - Has submitted wrong file previously. Correcting. The previous comment is valid though.
          Hide
          Konstantin Boudnik added a comment -

          This patch adds Herriot sources to the source.jar file; removes a dependency on JUnit v3, and fixes some of JavaDocs issues. Also, a couple of import optimizations are done.

          Show
          Konstantin Boudnik added a comment - This patch adds Herriot sources to the source.jar file; removes a dependency on JUnit v3, and fixes some of JavaDocs issues. Also, a couple of import optimizations are done.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12444175/HADOOP-6332.0.22.patch
          against trunk revision 941662.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 48 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings).

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/514/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/514/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/514/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/514/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/514/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12444175/HADOOP-6332.0.22.patch against trunk revision 941662. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 48 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/514/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/514/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/514/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/514/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/514/console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12444175/HADOOP-6332.0.22.patch
          against trunk revision 941662.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 48 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings).

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/66/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/66/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/66/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/66/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/66/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12444175/HADOOP-6332.0.22.patch against trunk revision 941662. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 48 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/66/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/66/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/66/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/66/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/66/console This message is automatically generated.
          Hide
          Konstantin Boudnik added a comment -

          Verification for the patch with mvn:install support

          Show
          Konstantin Boudnik added a comment - Verification for the patch with mvn:install support
          Hide
          Konstantin Boudnik added a comment -

          This patch also adds a capability to mvn-install Herriot artifacts locally with id hadoop-core-system. Now it can be pulled with internal resolver into HDFS and MR subprojects.

          Clearly, the Maven deployment will have to be added at some point.

          Show
          Konstantin Boudnik added a comment - This patch also adds a capability to mvn-install Herriot artifacts locally with id hadoop-core-system . Now it can be pulled with internal resolver into HDFS and MR subprojects. Clearly, the Maven deployment will have to be added at some point.
          Hide
          Konstantin Boudnik added a comment -

          Run verification one more time.

          Show
          Konstantin Boudnik added a comment - Run verification one more time.
          Hide
          Konstantin Boudnik added a comment -

          Addressing comments. jar-test-system is removed from the build.

          Some additional investigation shows that in the current 0.20 implementation Herriot build also ships existing functional tests only. This clearly needs to be fixed for 0.20 and trunk. But for the common's trunk we don't need to target because there's no system tests just for the common component.

          Show
          Konstantin Boudnik added a comment - Addressing comments. jar-test-system is removed from the build. Some additional investigation shows that in the current 0.20 implementation Herriot build also ships existing functional tests only. This clearly needs to be fixed for 0.20 and trunk. But for the common's trunk we don't need to target because there's no system tests just for the common component.
          Hide
          Konstantin Boudnik added a comment -

          Need to rerun the verification

          Show
          Konstantin Boudnik added a comment - Need to rerun the verification
          Hide
          Konstantin Boudnik added a comment -

          Actually, I'm wrong about having a problem in the original jar-test-system implementation. Looks like in the trunk the jar-test is implemented slightly different which causes this effect. Hmm...

          Show
          Konstantin Boudnik added a comment - Actually, I'm wrong about having a problem in the original jar-test-system implementation. Looks like in the trunk the jar-test is implemented slightly different which causes this effect. Hmm...
          Hide
          Konstantin Boudnik added a comment -

          system-test.xml need not go in common

          While I'm mostly agree that system-test.xml shouldn't be in common (a config file in common shouldn't have any knowledge about upstream dependencies), I am reluctant to split it. The problem with the split as I see it is that both copies of the file in HDFS and MR will mostly contains the same information with some minor differences. However, considering the exposing upstream dependencies is worst I will make the split and post new patch shortly.

          jar-test-system ant target

          Thanks for catching this one. Looks like we have the same problem in original implementation and it has been missed. Will fix it.

          Show
          Konstantin Boudnik added a comment - system-test.xml need not go in common While I'm mostly agree that system-test.xml shouldn't be in common (a config file in common shouldn't have any knowledge about upstream dependencies), I am reluctant to split it. The problem with the split as I see it is that both copies of the file in HDFS and MR will mostly contains the same information with some minor differences. However, considering the exposing upstream dependencies is worst I will make the split and post new patch shortly. jar-test-system ant target Thanks for catching this one. Looks like we have the same problem in original implementation and it has been missed. Will fix it.
          Hide
          Sharad Agarwal added a comment -

          Skimmed the patch. Some minor comments:

          • system-test.xml need not go in common. It is not used and required by common code. We can split it into hdfs-system-test.xml and mapred-system-test.xml when working for respective forward ports.
            Also $(YINST_ROOT) must be removed.
          • jar-test-system ant target is building jar with unit tests. This should have only system tests. (Right now we don't have any system tests in common. So perhaps we can drop this target for now.)
          Show
          Sharad Agarwal added a comment - Skimmed the patch. Some minor comments: system-test.xml need not go in common. It is not used and required by common code. We can split it into hdfs-system-test.xml and mapred-system-test.xml when working for respective forward ports. Also $(YINST_ROOT) must be removed. jar-test-system ant target is building jar with unit tests. This should have only system tests. (Right now we don't have any system tests in common. So perhaps we can drop this target for now.)
          Hide
          Konstantin Boudnik added a comment -

          The audit warning is about absence of Apache License boiler plate in tests list file. I don't think it is possible to have it there. Besides, similar files in HDFS and MR don't have it. Let's punt on this.

          Show
          Konstantin Boudnik added a comment - The audit warning is about absence of Apache License boiler plate in tests list file. I don't think it is possible to have it there. Besides, similar files in HDFS and MR don't have it. Let's punt on this.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12444028/HADOOP-6332.0.22.patch
          against trunk revision 941662.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 52 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings).

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/512/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/512/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/512/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/512/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/512/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12444028/HADOOP-6332.0.22.patch against trunk revision 941662. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 52 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/512/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/512/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/512/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/512/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/512/console This message is automatically generated.
          Hide
          Konstantin Boudnik added a comment -

          Issues found by test-patch are fixed. Resubmitting.

          Show
          Konstantin Boudnik added a comment - Issues found by test-patch are fixed. Resubmitting.
          Hide
          Konstantin Boudnik added a comment -

          Addressing audit warning: missed Apache license boiler plate.

          Show
          Konstantin Boudnik added a comment - Addressing audit warning: missed Apache license boiler plate.
          Hide
          Konstantin Boudnik added a comment -

          Missing tests list file is added.

          Show
          Konstantin Boudnik added a comment - Missing tests list file is added.
          Hide
          Konstantin Boudnik added a comment -

          The patch missed a file

          Show
          Konstantin Boudnik added a comment - The patch missed a file
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12444019/HADOOP-6332.0.22.patch
          against trunk revision 941662.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 49 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings).

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/511/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/511/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/511/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/511/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/511/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12444019/HADOOP-6332.0.22.patch against trunk revision 941662. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 49 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/511/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/511/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/511/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/511/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/511/console This message is automatically generated.
          Hide
          Konstantin Boudnik added a comment -

          Patch seems to be ready for verification.

          Show
          Konstantin Boudnik added a comment - Patch seems to be ready for verification.
          Hide
          Konstantin Boudnik added a comment -

          Herriot artifacts are being produced as expected.
          Pushing them to maven is needed later on.

          This patch is ready to be used as a base for HDFS and MR forward patches of Herriot.

          Show
          Konstantin Boudnik added a comment - Herriot artifacts are being produced as expected. Pushing them to maven is needed later on. This patch is ready to be used as a base for HDFS and MR forward patches of Herriot.
          Hide
          Konstantin Boudnik added a comment -

          In this version of the path all old functionality of the build works as before.
          Herriot artifacts aren't produced yet, but this seems to be pretty minor fix.

          Show
          Konstantin Boudnik added a comment - In this version of the path all old functionality of the build works as before. Herriot artifacts aren't produced yet, but this seems to be pretty minor fix.
          Hide
          Konstantin Boudnik added a comment -

          Very first draft of forward patch for Common's trunk. It works through all four patches posted earlier for yahoo-0.20.

          Right now build is passing. However, core tests are broken and no Herriot artifacts are being created. Will be fixing these bugs in the next a couple of days.

          Show
          Konstantin Boudnik added a comment - Very first draft of forward patch for Common's trunk. It works through all four patches posted earlier for yahoo-0.20. Right now build is passing. However, core tests are broken and no Herriot artifacts are being created. Will be fixing these bugs in the next a couple of days.
          Hide
          Konstantin Boudnik added a comment -

          Using $(something) screws up our XML processing Has to be fixed.
          This patch is on top of 6332-phase2.fix2.patch. Not to commit here for it will be done as a part of forward port patch later.

          Show
          Konstantin Boudnik added a comment - Using $(something) screws up our XML processing Has to be fixed. This patch is on top of 6332-phase2.fix2.patch. Not to commit here for it will be done as a part of forward port patch later.
          Hide
          Konstantin Boudnik added a comment -

          In the secured environment a client should make a privileged RPC call to access a FileSystem instance from an NN. Thus the fix.

          This patch has to be applied on top of 6332-phase2.patch. Not for the inclusion here.

          Show
          Konstantin Boudnik added a comment - In the secured environment a client should make a privileged RPC call to access a FileSystem instance from an NN. Thus the fix. This patch has to be applied on top of 6332-phase2.patch. Not for the inclusion here.
          Hide
          Konstantin Boudnik added a comment -

          This is the second portion of main Herriot functionality including some of the tests already linked to the JIRA.

          This patch isn't for commit to the Apache 0.20 branch, but is the reference material for coming forward port to the trunk (0.22). During the forward port process the tests (about 7 of them or so) from this patch will be taken out and finally replaced with the patches attached to the linked JIRAs.

          Show
          Konstantin Boudnik added a comment - This is the second portion of main Herriot functionality including some of the tests already linked to the JIRA. This patch isn't for commit to the Apache 0.20 branch, but is the reference material for coming forward port to the trunk (0.22). During the forward port process the tests (about 7 of them or so) from this patch will be taken out and finally replaced with the patches attached to the linked JIRAs.
          Hide
          Konstantin Boudnik added a comment -

          A tiny inconsistency in the build.xml has been discovered. Fixed.

          Show
          Konstantin Boudnik added a comment - A tiny inconsistency in the build.xml has been discovered. Fixed.
          Hide
          Konstantin Boudnik added a comment -

          This is patch for y20-security which might have conflicts with current 0.20-branch.
          We'll be proving a forward port patch for the trunk soon.

          Show
          Konstantin Boudnik added a comment - This is patch for y20-security which might have conflicts with current 0.20-branch. We'll be proving a forward port patch for the trunk soon.
          Hide
          Konstantin Boudnik added a comment -

          @Stephen: the main reason to use code injection is to completely hide testing handles from any chance of misusing by a stranger. Apparently many of the contracts (interfaces, APIs) we are interested in a course of testing either unveil internal states of key Hadoop components or allow to perform 'undesirable' actions such as killing a job, a tasktracker, or a datanode it'd be unwise to keep them in the A-grade production code. Therefore, code injection seems to be the right technique for this.

          Next version of the patch is coming any minute now. It will be clear that all interfaces exposed to test are defined statically. Their implementation is injected though, which shouldn't concern anyone but framework developers.

          Now, a particular implementation of injection doesn't really matter. We could've go with ASM or BCEL for the purpose. It happens that we have readily available AspectJ providing high-level language capabilities, Eclipse integration, etc. That explain the choice of the framework.

          As for an extra burden for future contributors: instrumentation is used for internal framework mechanics and shouldn't be exposed to the test developers. Thus, if one simply want to develop a cluster test she/he can do it from a vanilla Eclipse without AJDT installed. Or from IDEA (which I personally prefer and use all the time, except when I need to develop/fix some aspects). Or from vim (not like I suggest to do it

          Show
          Konstantin Boudnik added a comment - @Stephen: the main reason to use code injection is to completely hide testing handles from any chance of misusing by a stranger. Apparently many of the contracts (interfaces, APIs) we are interested in a course of testing either unveil internal states of key Hadoop components or allow to perform 'undesirable' actions such as killing a job, a tasktracker, or a datanode it'd be unwise to keep them in the A-grade production code. Therefore, code injection seems to be the right technique for this. Next version of the patch is coming any minute now. It will be clear that all interfaces exposed to test are defined statically. Their implementation is injected though, which shouldn't concern anyone but framework developers. Now, a particular implementation of injection doesn't really matter. We could've go with ASM or BCEL for the purpose. It happens that we have readily available AspectJ providing high-level language capabilities, Eclipse integration, etc. That explain the choice of the framework. As for an extra burden for future contributors: instrumentation is used for internal framework mechanics and shouldn't be exposed to the test developers. Thus, if one simply want to develop a cluster test she/he can do it from a vanilla Eclipse without AJDT installed. Or from IDEA (which I personally prefer and use all the time, except when I need to develop/fix some aspects). Or from vim (not like I suggest to do it
          Hide
          Stephen Watt added a comment -

          @Sharad

          Thanks for the patch. Is there a reason why we're now incorporating Aspect Oriented Programming into the test framework ?

          While I can appreciate the features it offers, when one considers the effort involved in getting an AOP runtime setup in an IDE, which is required to get folks writing and contributing test cases to the framework, I'm worried the additional effort / complexity is going to scare off would be contributors.

          Show
          Stephen Watt added a comment - @Sharad Thanks for the patch. Is there a reason why we're now incorporating Aspect Oriented Programming into the test framework ? While I can appreciate the features it offers, when one considers the effort involved in getting an AOP runtime setup in an IDE, which is required to get folks writing and contributing test cases to the framework, I'm worried the additional effort / complexity is going to scare off would be contributors.
          Hide
          Sharad Agarwal added a comment -

          Work in progress patch.

          Show
          Sharad Agarwal added a comment - Work in progress patch.
          Hide
          Sharad Agarwal added a comment -

          is the intention that one write their own Test Case as a normal Hadoop Job (such as TeraSort) as a separate activity and then one would get a handle to the MRCluster in the @before method, and then start the test by calling Job.submit() in the @test method and then be able to pass the jobID back to the JTClient to do whatever you needed to with it at that point ?

          The intention is that a Test case can submit a job and be able to assert the state of various entities - Job/JT/TT: datastructures, filesystem etc. Also it can potentially control the daemons by simulating a particular failure scenario. Should be clearer once I will post the patch.

          Show
          Sharad Agarwal added a comment - is the intention that one write their own Test Case as a normal Hadoop Job (such as TeraSort) as a separate activity and then one would get a handle to the MRCluster in the @before method, and then start the test by calling Job.submit() in the @test method and then be able to pass the jobID back to the JTClient to do whatever you needed to with it at that point ? The intention is that a Test case can submit a job and be able to assert the state of various entities - Job/JT/TT: datastructures, filesystem etc. Also it can potentially control the daemons by simulating a particular failure scenario. Should be clearer once I will post the patch.
          Hide
          Stephen Watt added a comment -

          @Sharad/Cos

          1) Thanks for the offer to post the patch. FYI, you might want to check the pathing in the new patch, as the existing ones go back one too many directories so it cannot find build.xml. Its not too big of an issue as I am running everything from eclipse at present.

          2) Its nice to have the TestCluster sample JUnit Test as a starting point in using the framework.

          3) In the current patch there is a "Cluster" class referenced on line 15 of JTClient, but not implemented anywhere in the patch.

          3) Design: At present, we have the framework for cluster management and we have the M/R JobTracker Client. As to how someone would use this to run system tests on Hadoop... is the intention that one write their own Test Case as a normal Hadoop Job (such as TeraSort) as a separate activity and then one would get a handle to the MRCluster in the @before method, and then start the test by calling Job.submit() in the @test method and then be able to pass the jobID back to the JTClient to do whatever you needed to with it at that point ?

          Show
          Stephen Watt added a comment - @Sharad/Cos 1) Thanks for the offer to post the patch. FYI, you might want to check the pathing in the new patch, as the existing ones go back one too many directories so it cannot find build.xml. Its not too big of an issue as I am running everything from eclipse at present. 2) Its nice to have the TestCluster sample JUnit Test as a starting point in using the framework. 3) In the current patch there is a "Cluster" class referenced on line 15 of JTClient, but not implemented anywhere in the patch. 3) Design: At present, we have the framework for cluster management and we have the M/R JobTracker Client. As to how someone would use this to run system tests on Hadoop... is the intention that one write their own Test Case as a normal Hadoop Job (such as TeraSort) as a separate activity and then one would get a handle to the MRCluster in the @before method, and then start the test by calling Job.submit() in the @test method and then be able to pass the jobID back to the JTClient to do whatever you needed to with it at that point ?
          Hide
          Sharad Agarwal added a comment -

          We @Yahoo are working on this. I will post a patch in couple of days after getting it in a reasonable shape.

          Show
          Sharad Agarwal added a comment - We @Yahoo are working on this. I will post a patch in couple of days after getting it in a reasonable shape.
          Hide
          Stephen Watt added a comment -

          I've written some of the necessary implementation classes to get a rough draft of this framework running. At present, it appears what we have is the ability to define and run the tests on a specific cluster, with some basic stop/start and fault injection features for the cluster management. However, after passing in all the correct values to the ShellProcessManager constructor (the class that identifies the cluster you want to run your unit test on) and attempting to call start() on my concrete implemention of the AbstractMasterSlaveCluster, I get the exception described below. Is anyone else seeing this ? I get this on both OS/x and Linux.

          Note: The directory exists and start-all works just fine.

          Exception in thread "main" java.io.IOException: Cannot run program "start-all.sh" (in directory "/home/hadoop/hadoop-0.20.1/bin"): error=2, No such file or directory
          at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
          at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
          at org.apache.hadoop.util.Shell.run(Shell.java:134)
          at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
          at org.apache.hadoop.test.system.process.ShellProcessManager.execute(ShellProcessManager.java:71)
          at org.apache.hadoop.test.system.process.ShellProcessManager.start(ShellProcessManager.java:62)
          at org.apache.hadoop.test.system.AbstractMasterSlaveCluster.start(AbstractMasterSlaveCluster.java:64)
          at org.apache.hadoop.test.CheckClusterTest.main(CheckClusterTest.java:24)
          Caused by: java.io.IOException: error=2, No such file or directory
          at java.lang.UNIXProcess.forkAndExec(Native Method)
          at java.lang.UNIXProcess.<init>(UNIXProcess.java:53)
          at java.lang.ProcessImpl.start(ProcessImpl.java:91)
          at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)

          Show
          Stephen Watt added a comment - I've written some of the necessary implementation classes to get a rough draft of this framework running. At present, it appears what we have is the ability to define and run the tests on a specific cluster, with some basic stop/start and fault injection features for the cluster management. However, after passing in all the correct values to the ShellProcessManager constructor (the class that identifies the cluster you want to run your unit test on) and attempting to call start() on my concrete implemention of the AbstractMasterSlaveCluster, I get the exception described below. Is anyone else seeing this ? I get this on both OS/x and Linux. Note: The directory exists and start-all works just fine. Exception in thread "main" java.io.IOException: Cannot run program "start-all.sh" (in directory "/home/hadoop/hadoop-0.20.1/bin"): error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:459) at org.apache.hadoop.util.Shell.runCommand(Shell.java:149) at org.apache.hadoop.util.Shell.run(Shell.java:134) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286) at org.apache.hadoop.test.system.process.ShellProcessManager.execute(ShellProcessManager.java:71) at org.apache.hadoop.test.system.process.ShellProcessManager.start(ShellProcessManager.java:62) at org.apache.hadoop.test.system.AbstractMasterSlaveCluster.start(AbstractMasterSlaveCluster.java:64) at org.apache.hadoop.test.CheckClusterTest.main(CheckClusterTest.java:24) Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:53) at java.lang.ProcessImpl.start(ProcessImpl.java:91) at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
          Hide
          Tom White added a comment -

          I would prefer to see a role-based approach in ClusterProcessManager (and other classes) since having explicit master/slave roles makes it difficult to support clusters with a separate namenode and jobtracker, or ZooKeeper (where all nodes are peers).

          Show
          Tom White added a comment - I would prefer to see a role-based approach in ClusterProcessManager (and other classes) since having explicit master/slave roles makes it difficult to support clusters with a separate namenode and jobtracker, or ZooKeeper (where all nodes are peers).
          Hide
          Konstantin Boudnik added a comment -

          Now with modifications for the build so the system can be compiled from ant environment and all jar can be created, etc.

          Show
          Konstantin Boudnik added a comment - Now with modifications for the build so the system can be compiled from ant environment and all jar can be created, etc.
          Hide
          Konstantin Boudnik added a comment -

          I've split the patch to its Common and Mapreduce parts. It should be easier to maintain now.

          Show
          Konstantin Boudnik added a comment - I've split the patch to its Common and Mapreduce parts. It should be easier to maintain now.
          Hide
          Konstantin Boudnik added a comment -

          Sharad, do you think it makes sense to split the patch into Common and MR respective parts? The patch is getting bigger and harder to apply to two different subprojects. We might keep both in here for now just for the convenience sake...

          Show
          Konstantin Boudnik added a comment - Sharad, do you think it makes sense to split the patch into Common and MR respective parts? The patch is getting bigger and harder to apply to two different subprojects. We might keep both in here for now just for the convenience sake...
          Hide
          Sharad Agarwal added a comment -

          Changes from previous patch:

          • Added a representative set of observability APIs to DameonProtocol, JTProtocol and TTProtocol.
          • Introduced org.apache.hadoop.mapreduce.MRFault enum. The thought is to have the capability for tests to switch on/off a set of faults.
          • Added couple of representative verification APIs in JTClient.
          Show
          Sharad Agarwal added a comment - Changes from previous patch: Added a representative set of observability APIs to DameonProtocol, JTProtocol and TTProtocol. Introduced org.apache.hadoop.mapreduce.MRFault enum. The thought is to have the capability for tests to switch on/off a set of faults. Added couple of representative verification APIs in JTClient.
          Hide
          Stephen Watt added a comment -

          Great discussion. Kos, et al.. do you think we are at point where we can consider starting to write some code ? Another thing we need to do is identify which of the functional tests we are going to port. 12/11 is my last day but I will be back on 1/4. Not sure how much run way we have before 0.21 is due but I'd like to see if we can at least have the framework plus a couple of tests available in time for the release.

          Show
          Stephen Watt added a comment - Great discussion. Kos, et al.. do you think we are at point where we can consider starting to write some code ? Another thing we need to do is identify which of the functional tests we are going to port. 12/11 is my last day but I will be back on 1/4. Not sure how much run way we have before 0.21 is due but I'd like to see if we can at least have the framework plus a couple of tests available in time for the release.
          Hide
          Konstantin Boudnik added a comment -

          Thanks for the answers, Sharad. All of it makes sense. One comment though:

          But seems like tests will benefit by having a control on start/stop daemons (for example to test lost/blacklisted TT, tests may want to kill a TT). How and which tar balls are pushed and deployed are not in scope of this because test cases need not bother about it.

          Right, actual bits push has to be done somewhere else: Hudson or else.

          To work with already started cluster, a config flag something like NO_CLUSTER_START can be set which will let test suites skip the cluster start/stop step.

          My thought on this was that the cluster's component restart part should be done in a way consistent with setup/teardown approach of pretty much any test framework like JUnit. If a test needs to start/stop a cluster then it needs to specify @Before and @After methods which will do that using provided control primitives (e.g. start_datanode.sh, stop_datanode.sh or whatever).

          Show
          Konstantin Boudnik added a comment - Thanks for the answers, Sharad. All of it makes sense. One comment though: But seems like tests will benefit by having a control on start/stop daemons (for example to test lost/blacklisted TT, tests may want to kill a TT). How and which tar balls are pushed and deployed are not in scope of this because test cases need not bother about it. Right, actual bits push has to be done somewhere else: Hudson or else. To work with already started cluster, a config flag something like NO_CLUSTER_START can be set which will let test suites skip the cluster start/stop step. My thought on this was that the cluster's component restart part should be done in a way consistent with setup/teardown approach of pretty much any test framework like JUnit. If a test needs to start/stop a cluster then it needs to specify @Before and @After methods which will do that using provided control primitives (e.g. start_datanode.sh, stop_datanode.sh or whatever).
          Hide
          Sharad Agarwal added a comment -

          Thanks Konstantin for looking at this.

          shall we get rid off {[MasterSlaveCluster}} and use AbstractMasterSlaveCluster instead?

          Since tests are using MRCluster directly, having MasterSlaveCluster interface is not adding value. When I started, I was not exposing MRCluster to tests. It was a private class in MRClusterFactory. Instead tests were using MasterSlaveCluster interface directly.
          So yes we can get rid of MasterSlaveCluster.

          it seems like some of the classes in the proposed patch might benefit from HDFS-326.

          Some of the APIs proposed by HDFS-326 and this patch are indeed same. The thought here is that these APIs are injected and only for tests. If some of these APIs (via HDFS-326 or otherwise) are in future considered worthy of having in the production code then we can easily get rid of those from the test injection code and promote in the regular code base.

          It was my understanding that the group attending the call felt that deployment (cluster setup/teardown) was not within the scope of the JIRA.

          Let me clarify here about the setup and teardown. By cluster setup/teardown I mean cluster start/stop and not deployment. I agree that deployment should not be in the scope of this JIRA. But seems like tests will benefit by having a control on start/stop daemons (for example to test lost/blacklisted TT, tests may want to kill a TT). How and which tar balls are pushed and deployed are not in scope of this because test cases need not bother about it.
          To work with already started cluster, a config flag something like NO_CLUSTER_START can be set which will let test suites skip the cluster start/stop step.
          Make sense ?

          On a more deeper look, i'd suggest not to have hard coded command names and environment variables.

          Perhaps we can have default names set up in the code but can be overridden via setting a property.

          Show
          Sharad Agarwal added a comment - Thanks Konstantin for looking at this. shall we get rid off {[MasterSlaveCluster}} and use AbstractMasterSlaveCluster instead? Since tests are using MRCluster directly, having MasterSlaveCluster interface is not adding value. When I started, I was not exposing MRCluster to tests. It was a private class in MRClusterFactory. Instead tests were using MasterSlaveCluster interface directly. So yes we can get rid of MasterSlaveCluster. it seems like some of the classes in the proposed patch might benefit from HDFS-326 . Some of the APIs proposed by HDFS-326 and this patch are indeed same. The thought here is that these APIs are injected and only for tests. If some of these APIs (via HDFS-326 or otherwise) are in future considered worthy of having in the production code then we can easily get rid of those from the test injection code and promote in the regular code base. It was my understanding that the group attending the call felt that deployment (cluster setup/teardown) was not within the scope of the JIRA. Let me clarify here about the setup and teardown. By cluster setup/teardown I mean cluster start/stop and not deployment. I agree that deployment should not be in the scope of this JIRA. But seems like tests will benefit by having a control on start/stop daemons (for example to test lost/blacklisted TT, tests may want to kill a TT). How and which tar balls are pushed and deployed are not in scope of this because test cases need not bother about it. To work with already started cluster, a config flag something like NO_CLUSTER_START can be set which will let test suites skip the cluster start/stop step. Make sense ? On a more deeper look, i'd suggest not to have hard coded command names and environment variables. Perhaps we can have default names set up in the code but can be overridden via setting a property.
          Hide
          Konstantin Boudnik added a comment -

          Looks like I can't stop. Tests shouldn't be written for JUnit v.3. They have to be JUnit v.4 instead: annotations and all that.

          Show
          Konstantin Boudnik added a comment - Looks like I can't stop. Tests shouldn't be written for JUnit v.3. They have to be JUnit v.4 instead: annotations and all that .
          Hide
          Konstantin Boudnik added a comment -

          On a more deeper look, i'd suggest not to have hardcoded command names and environment variables. Instead, it'd make sense to have a configuration file which will describe whatever's needed. I can see why hard coded names of the scripts are used for MapReduce, but I'd advocate to avoid such practice wherever possible.

          Show
          Konstantin Boudnik added a comment - On a more deeper look, i'd suggest not to have hardcoded command names and environment variables. Instead, it'd make sense to have a configuration file which will describe whatever's needed. I can see why hard coded names of the scripts are used for MapReduce, but I'd advocate to avoid such practice wherever possible.
          Hide
          Konstantin Boudnik added a comment -

          Great, thanks for putting it together, Stephen! And you're correct about the deployment: it should be out of the scope of this JIRA.

          Show
          Konstantin Boudnik added a comment - Great, thanks for putting it together, Stephen! And you're correct about the deployment: it should be out of the scope of this JIRA.
          Hide
          Stephen Watt added a comment -

          Here is a wiki link that provides a synopsis of the discussions from the call as well as a proposed solution

          http://wiki.apache.org/hadoop/SystemTestingConfCallSynopsis

          NB: It was my understanding that the group attending the call felt that deployment (cluster setup/teardown) was not within the scope of the JIRA. The proposed solution involved a testing runtime that could be pointed at a variety of existing clusters, but the deployment of the clusters themselves were a separate concern.

          Show
          Stephen Watt added a comment - Here is a wiki link that provides a synopsis of the discussions from the call as well as a proposed solution http://wiki.apache.org/hadoop/SystemTestingConfCallSynopsis NB: It was my understanding that the group attending the call felt that deployment (cluster setup/teardown) was not within the scope of the JIRA. The proposed solution involved a testing runtime that could be pointed at a variety of existing clusters, but the deployment of the clusters themselves were a separate concern.
          Hide
          Konstantin Boudnik added a comment -

          Looks good overall. A couple of comments:

          • shall we get rid off {[MasterSlaveCluster}} and use AbstractMasterSlaveCluster instead? It will add more flexibility in the future
          • it seems like some of the classes in the proposed patch might benefit from HDFS-326. Shall these two be more synchronized with each other? Otherwise, we might end up with two sets of protocols serving similar purpose, but differently implemented.
          Show
          Konstantin Boudnik added a comment - Looks good overall. A couple of comments: shall we get rid off {[MasterSlaveCluster}} and use AbstractMasterSlaveCluster instead? It will add more flexibility in the future it seems like some of the classes in the proposed patch might benefit from HDFS-326 . Shall these two be more synchronized with each other? Otherwise, we might end up with two sets of protocols serving similar purpose, but differently implemented.
          Hide
          Konstantin Boudnik added a comment -

          I'd like to link these two together because they seem to be related in a sense.

          Show
          Konstantin Boudnik added a comment - I'd like to link these two together because they seem to be related in a sense.
          Hide
          Sharad Agarwal added a comment -

          As mentioned above, the abstraction of cluster setup and teardown is well within the scope of this JIRA. The attached patch tries to address this. Also it provides placeholders for exposing additional APIs from daemon processes, and a client interface to talk to daemons.
          About this patch:

          • System test framework classes are in org.apache.hadoop.test.system
            • DaemonClient provide the interface to manage a particular remote daemon process.
            • DaemonProtocol is a RPC interface for a daemon. Note this needs to be woven in the server side code via aspectj.
            • MasterSlaveCluster interface provides access to master and slaves client handles.
          • Abstraction for remote process management is org.apache.hadoop.test.system.process.ClusterProcessManager. The default implementation being ShellProcessManager which will use hadoop bin scripts to start/stop the daemon. Apart from process management, if later we want to push the tar balls on cluster nodes etc, then this interface can be exploited.
          • The implementation for mapreduce is in org.apache.hadoop.mapreduce. (Needs to be done in MAPREDUCE-1154. Putting here for easy reference.)
            • JTProtocol interface implementation needs to be woven in Jobtracker code. Similary TTProtocol in TaskTracker code.
            • JTClient and TTClient are the client classes. Note that JTClient composes org.apache.hadoop.mapreduce.Cluster class. For maintainability, the intention is to do minimum weaving and if possible avoid it on client side. The verification utilites which are generic and can be used for all system test cases can be in JTClient/TTClient/MRCluster classes.
            • Tests will create MRCluster via using MRClusterFactory class. A sample test class is TestCluster. Perhaps we can have a test suite where cluster is setup and teardown once. The tests in a particular suite are expected to be side effects free.

          Thoughts ?

          Show
          Sharad Agarwal added a comment - As mentioned above, the abstraction of cluster setup and teardown is well within the scope of this JIRA. The attached patch tries to address this. Also it provides placeholders for exposing additional APIs from daemon processes, and a client interface to talk to daemons. About this patch: System test framework classes are in org.apache.hadoop.test.system DaemonClient provide the interface to manage a particular remote daemon process. DaemonProtocol is a RPC interface for a daemon. Note this needs to be woven in the server side code via aspectj. MasterSlaveCluster interface provides access to master and slaves client handles. Abstraction for remote process management is org.apache.hadoop.test.system.process.ClusterProcessManager. The default implementation being ShellProcessManager which will use hadoop bin scripts to start/stop the daemon. Apart from process management, if later we want to push the tar balls on cluster nodes etc, then this interface can be exploited. The implementation for mapreduce is in org.apache.hadoop.mapreduce. (Needs to be done in MAPREDUCE-1154 . Putting here for easy reference.) JTProtocol interface implementation needs to be woven in Jobtracker code. Similary TTProtocol in TaskTracker code. JTClient and TTClient are the client classes. Note that JTClient composes org.apache.hadoop.mapreduce.Cluster class. For maintainability, the intention is to do minimum weaving and if possible avoid it on client side. The verification utilites which are generic and can be used for all system test cases can be in JTClient/TTClient/MRCluster classes. Tests will create MRCluster via using MRClusterFactory class. A sample test class is TestCluster. Perhaps we can have a test suite where cluster is setup and teardown once. The tests in a particular suite are expected to be side effects free. Thoughts ?
          Hide
          Stephen Watt added a comment -

          Steve Loughran has created this WikiPage for our call - http://wiki.apache.org/hadoop/TestingNov2009

          Show
          Stephen Watt added a comment - Steve Loughran has created this WikiPage for our call - http://wiki.apache.org/hadoop/TestingNov2009
          Hide
          Stephen Watt added a comment -

          I will be hosting this meeting via a skype conference. Please contact me and send me your skype name if you would like to be added to the participant list. The conference will be on 11/23 at 20:00 GMT (2PM CST / 12PM PST).

          Show
          Stephen Watt added a comment - I will be hosting this meeting via a skype conference. Please contact me and send me your skype name if you would like to be added to the participant list. The conference will be on 11/23 at 20:00 GMT (2PM CST / 12PM PST).
          Hide
          steve_l added a comment -

          I could do mon tue or wed next week, that is 23, 24 or 25 of November, at or after 20:00 GMT, which is what, midday pacific? We could start with getting everyone interested in the problem to talk about what their use cases/needs are, and then discuss how to go about meeting them

          I'll be connecting from home in the UK; assuming the majority of participants in the it's probably best if someone in the bay area hosts the Skype meeting for lower latency and higher reliability. Any volunteers?

          Show
          steve_l added a comment - I could do mon tue or wed next week, that is 23, 24 or 25 of November, at or after 20:00 GMT, which is what, midday pacific? We could start with getting everyone interested in the problem to talk about what their use cases/needs are, and then discuss how to go about meeting them I'll be connecting from home in the UK; assuming the majority of participants in the it's probably best if someone in the bay area hosts the Skype meeting for lower latency and higher reliability. Any volunteers?
          Hide
          Jeff Hammerbacher added a comment -

          Hey,

          Where do we stand on this issue? Should we try to arrange a call soon?

          Thanks,
          Jeff

          Show
          Jeff Hammerbacher added a comment - Hey, Where do we stand on this issue? Should we try to arrange a call soon? Thanks, Jeff
          Hide
          steve_l added a comment -

          @Arun -pushing out configurations to clusters partially explores the config space, but not very broadly; more leading edge tricks involve machine generation of very different configurations, and/or pseudo-RNG driven configuration option generation

          Some videos on this topic

          #Skoll: Distributed Continuous QA
          http://www.cs.umd.edu/~atif/papers/MemonICSE2004.pdf
          http://video.google.ca/videoplay?docid=8839342624264709864

          1. How we test -these are tests that run under junit from Ant/IDE, but can then bring up a cluster and run junit underneath. It gets complex
            http://www.youtube.com/watch?v=NKshZGUWHJ4

          So, while I agree, you do need ways to bring up clusters -indeed, I have some I can demo, I do think it can be best done outside the junit test run itself

          1. Ant tasks to allocate machines from different IaaS systems -that includes selecting from a list of physical machines you have to hand.
          2. whatever we use to explore the configuration space runs very differently from inside a Junit test run, because you want to create clusters with different options, then run the entire test suite. What is key is to get the output from that run and merge it with everything else.

          Like I said, we should have a phone conf about this before anyone starts coding, I'd like to see what Alex has done and I can show what I have, I'd like to hear from Stephen about how IBM run their tests too. How about everyone who is at apachecon meet up and talk about this, and then next week we can have an online gettogether in some timezone that works for everyone?

          Show
          steve_l added a comment - @Arun -pushing out configurations to clusters partially explores the config space, but not very broadly; more leading edge tricks involve machine generation of very different configurations, and/or pseudo-RNG driven configuration option generation Some videos on this topic #Skoll: Distributed Continuous QA http://www.cs.umd.edu/~atif/papers/MemonICSE2004.pdf http://video.google.ca/videoplay?docid=8839342624264709864 How we test -these are tests that run under junit from Ant/IDE, but can then bring up a cluster and run junit underneath. It gets complex http://www.youtube.com/watch?v=NKshZGUWHJ4 So, while I agree, you do need ways to bring up clusters -indeed, I have some I can demo, I do think it can be best done outside the junit test run itself Ant tasks to allocate machines from different IaaS systems -that includes selecting from a list of physical machines you have to hand. whatever we use to explore the configuration space runs very differently from inside a Junit test run, because you want to create clusters with different options, then run the entire test suite . What is key is to get the output from that run and merge it with everything else. Like I said, we should have a phone conf about this before anyone starts coding, I'd like to see what Alex has done and I can show what I have, I'd like to hear from Stephen about how IBM run their tests too. How about everyone who is at apachecon meet up and talk about this, and then next week we can have an online gettogether in some timezone that works for everyone?
          Hide
          Stephen Watt added a comment -

          I support this proposal. At IBM, we're active users of Hadoop, however we run into issues where we need to be able to test Hadoop on other versions of Java required for non-standard architectures. For instance, we'd like to investigate putting Hadoop through its paces on AS/400, z/OS or OS/390. To do that we have to use non-Sun Java distributions (such as IBM Java) as Sun does not provide a JVM for those architectures. This proposal would provide a means that would standardize and streamline how we provide real world testing for these architectures.

          At present, I'm using the Terabyte Gen/Sort/Validate jobs as they produce their own data, which greatly simplifies the test scripts, and they are easy to scale up and down.

          Lastly, from what I can gather, the framework is likely to be able incorporate existing cluster environments. Thus, if one is executing a M/R test it would run over whatever dfs the cluster is using, be it HDFS, Kosmos or S3. However, I only see an S3 sub-JIRA for this. Is the intent to purely support HDFS ?

          Show
          Stephen Watt added a comment - I support this proposal. At IBM, we're active users of Hadoop, however we run into issues where we need to be able to test Hadoop on other versions of Java required for non-standard architectures. For instance, we'd like to investigate putting Hadoop through its paces on AS/400, z/OS or OS/390. To do that we have to use non-Sun Java distributions (such as IBM Java) as Sun does not provide a JVM for those architectures. This proposal would provide a means that would standardize and streamline how we provide real world testing for these architectures. At present, I'm using the Terabyte Gen/Sort/Validate jobs as they produce their own data, which greatly simplifies the test scripts, and they are easy to scale up and down. Lastly, from what I can gather, the framework is likely to be able incorporate existing cluster environments. Thus, if one is executing a M/R test it would run over whatever dfs the cluster is using, be it HDFS, Kosmos or S3. However, I only see an S3 sub-JIRA for this. Is the intent to purely support HDFS ?
          Hide
          Alex Loddengaard added a comment -

          A potential use case for this tool would be to let Hadoop users put their jobs in a "test" and run them nightly on a (pseudo-distributed) cluster. I believe that having a framework that can use a running cluster or setup/teardown a new cluster will be handy for the mentioned use case. I also don't think that the setup/teardown stuff should dirty the code too much.

          Similarly, Nigel and I spoke a while back about having some sort of web dashboard where users posted version compatibility notes. Imagine a list of Hadoop users with check boxes next to each user that says "My 0.20.1 jobs worked in 0.20.2." I think this tool can play a role in the implementation of this idea, and setup/teardown and connecting to an existing cluster both make sense, I think.

          Show
          Alex Loddengaard added a comment - A potential use case for this tool would be to let Hadoop users put their jobs in a "test" and run them nightly on a (pseudo-distributed) cluster. I believe that having a framework that can use a running cluster or setup/teardown a new cluster will be handy for the mentioned use case. I also don't think that the setup/teardown stuff should dirty the code too much. Similarly, Nigel and I spoke a while back about having some sort of web dashboard where users posted version compatibility notes. Imagine a list of Hadoop users with check boxes next to each user that says "My 0.20.1 jobs worked in 0.20.2." I think this tool can play a role in the implementation of this idea, and setup/teardown and connecting to an existing cluster both make sense, I think.
          Hide
          Arun C Murthy added a comment -

          Steve - Lots of tests may well work with an already running cluster, but having utilities to setup/teardoan clusters (in a pluggable manner) is well within the scope of this jira, I think. We need to be able to poke the corners of these areas in Hadoop in an automated manner too...

          Show
          Arun C Murthy added a comment - Steve - Lots of tests may well work with an already running cluster, but having utilities to setup/teardoan clusters (in a pluggable manner) is well within the scope of this jira, I think. We need to be able to poke the corners of these areas in Hadoop in an automated manner too...
          Hide
          steve_l added a comment -

          Thinking about this a bit more, I want to make clear that I dont think we should abandon JUnit as a java language for writing tests in. It's simple, its extensible, it works. I do think that the report output format is limited, but that can be done with a better test runner, one that pulls in output from >1 process at the specific log levels, and such like. That's a feature to add later.

          What I do want to do is decouple cluster instantiation from those tests that just need a working cluster -all those whose setup/teardown create MiniMR and MiniDFS clusters. These in-VM clusters are good for debugging and getting all the log output, but unrealistic -single VM, no native code, not started via the shell scripts.

          One option is to leave the existing test suites alone, and start some new , hadoop-cluster-test, that

          1. Lets people bring up their own clusters how they choose (out of scope). However the cluster comes up, some properties file needs to be set up with the URLs of the filesystem and job tracker
          2. Contains tests that are written to be run against large, live clusters. The setup code doesn't need to bring up a cluster, it may need to clean up old output
          3. Possibly: has a shared static dataset for real testing. Size is the issue here, but some things could be generated , driven by pseudo random numbers for replicability.
          4. Publishes its test code as a JAR + build.xml that can be run against your own cluster
          5. Somewhere to experiment with better logging, test execution.
          Show
          steve_l added a comment - Thinking about this a bit more, I want to make clear that I dont think we should abandon JUnit as a java language for writing tests in. It's simple, its extensible, it works. I do think that the report output format is limited, but that can be done with a better test runner, one that pulls in output from >1 process at the specific log levels, and such like. That's a feature to add later. What I do want to do is decouple cluster instantiation from those tests that just need a working cluster -all those whose setup/teardown create MiniMR and MiniDFS clusters. These in-VM clusters are good for debugging and getting all the log output, but unrealistic -single VM, no native code, not started via the shell scripts. One option is to leave the existing test suites alone, and start some new , hadoop-cluster-test, that Lets people bring up their own clusters how they choose (out of scope). However the cluster comes up, some properties file needs to be set up with the URLs of the filesystem and job tracker Contains tests that are written to be run against large, live clusters. The setup code doesn't need to bring up a cluster, it may need to clean up old output Possibly: has a shared static dataset for real testing. Size is the issue here, but some things could be generated , driven by pseudo random numbers for replicability. Publishes its test code as a JAR + build.xml that can be run against your own cluster Somewhere to experiment with better logging, test execution.
          Hide
          steve_l added a comment -

          There's a number of use cases that a big test framework can handle, and while they shouldn't be interdependent, it would be nice to have tests that work with all

          1. Bringing up Hadoop clusters by asking IaaS systems for the machines, instantiating the cluster, then testing it to see it works. This is what I do. I normally just run Paolo Castagna's citerank code against the cluster; its a small dataset MR sequence that can take a couple of hours to run through.
          2. Testing that the latest build works on a pre-allocated physical/virtual cluster. You don't need to ask for the machines, you may need to push out the JARs/RPMs
          3. Testing that physical cluster works at the speeds to be expected from the #of disks and cores.
          4. Testing that MR algorithms work and work at scale
          5. Testing all the corner bits of Hadoop. The code, the web pages, etc.
          6. Testing the handling of the code (and/or opts team ) to simulated failures
          7. Exploring the configuration space of the cluster. That is the combination of options of the -site.xml files, and the servers/network on which Hadoop runs. This is surprisingly hard to do thoroughly, and it isn't done at scale right now. For example, I dont think anyone tests to see what happens on a big cluster when you set the replication factor to 10 for a big job, or crank it back to 1.

          It would be good to have a way to test all of this -or at least have the foundation for doing so.

          Now, have I left any use cases out?

          Like I said, I'd love a skype-based phone conf on the topic, the people who have done stuff in this area can talk about what they've done.

          Show
          steve_l added a comment - There's a number of use cases that a big test framework can handle, and while they shouldn't be interdependent, it would be nice to have tests that work with all Bringing up Hadoop clusters by asking IaaS systems for the machines, instantiating the cluster, then testing it to see it works. This is what I do. I normally just run Paolo Castagna's citerank code against the cluster; its a small dataset MR sequence that can take a couple of hours to run through. Testing that the latest build works on a pre-allocated physical/virtual cluster. You don't need to ask for the machines, you may need to push out the JARs/RPMs Testing that physical cluster works at the speeds to be expected from the #of disks and cores. Testing that MR algorithms work and work at scale Testing all the corner bits of Hadoop. The code, the web pages, etc. Testing the handling of the code (and/or opts team ) to simulated failures Exploring the configuration space of the cluster. That is the combination of options of the -site.xml files, and the servers/network on which Hadoop runs. This is surprisingly hard to do thoroughly, and it isn't done at scale right now. For example, I dont think anyone tests to see what happens on a big cluster when you set the replication factor to 10 for a big job, or crank it back to 1. It would be good to have a way to test all of this -or at least have the foundation for doing so. Now, have I left any use cases out? Like I said, I'd love a skype-based phone conf on the topic, the people who have done stuff in this area can talk about what they've done.
          Hide
          Hemanth Yamijala added a comment -

          I agree with Konstantin. I think we see a possibility of being able to develop useful automated tests that run on a large cluster with what we already have thereby reducing the start up time - a huge step forward from where we are in Hadoop currently.

          Show
          Hemanth Yamijala added a comment - I agree with Konstantin. I think we see a possibility of being able to develop useful automated tests that run on a large cluster with what we already have thereby reducing the start up time - a huge step forward from where we are in Hadoop currently.
          Hide
          Konstantin Boudnik added a comment -

          I'm not sure JUnit is ideal...

          We need something to provide a basic test harness functionality such start/stop test/suite and such. JUnit has its ups and downs. The main benefit is that we already have it all around the place. On the other hand there's not much alternatives. TestNG might be a candidate but it has a HUGE disadvantage: it doesn't support per test VM forking.

          Also, I'd prefer to keep a number of tools at minimum, if possible.

          Show
          Konstantin Boudnik added a comment - I'm not sure JUnit is ideal... We need something to provide a basic test harness functionality such start/stop test/suite and such. JUnit has its ups and downs. The main benefit is that we already have it all around the place. On the other hand there's not much alternatives. TestNG might be a candidate but it has a HUGE disadvantage: it doesn't support per test VM forking. Also, I'd prefer to keep a number of tools at minimum, if possible.
          Hide
          steve_l added a comment -

          correction, s/Aaron/r/Alex/

          Other thing: these entry points effectively become the way to start/stop Hadoop clusters.

          Show
          steve_l added a comment - correction, s/Aaron/r/Alex/ Other thing: these entry points effectively become the way to start/stop Hadoop clusters.
          Hide
          steve_l added a comment -

          Some initial thoughts.

          • Yes, this is good, maybe should have a skype conf or something on the topic, everyone can show what they have already. I know Aaron's done some work.
          • I'm not sure JUnit is ideal, because its test reports don't scale up to aggregated tests from different machines, different logs, partial failures. But it is a great way to start tests from the IDE/build too.
          • If we could move all tests against a functional cluster into JARs that run against a live cluster, they could be used for some of the system collaboration work, and let people test against different hadoop deployments (physical, VM-with RPMs installed, etc)
          • I would split cluster setup/teardown from the tests themselves for that reason, and because the startup and teardown delays are why the normal tests take so long. Tests that rely on a working cluster are different from those that push the cluster through its lifecycle and explore the corner cases of the cluster/hadoop configuration space.
          Show
          steve_l added a comment - Some initial thoughts. Yes, this is good, maybe should have a skype conf or something on the topic, everyone can show what they have already. I know Aaron's done some work. I'm not sure JUnit is ideal, because its test reports don't scale up to aggregated tests from different machines, different logs, partial failures. But it is a great way to start tests from the IDE/build too. If we could move all tests against a functional cluster into JARs that run against a live cluster, they could be used for some of the system collaboration work, and let people test against different hadoop deployments (physical, VM-with RPMs installed, etc) I would split cluster setup/teardown from the tests themselves for that reason, and because the startup and teardown delays are why the normal tests take so long. Tests that rely on a working cluster are different from those that push the cluster through its lifecycle and explore the corner cases of the cluster/hadoop configuration space.
          Hide
          Arun C Murthy added a comment -

          Some utility apis to provide a flavour for what we are trying to accomplish:

            /**
             * Sources of logs and outputs.
             */
            public enum LogSource {
              NAMENODE,
              DATANODE,
              JOBTRACKER,
              TASKTRACKER,
              TASK
            }
          
            /**
             * Setup a Hadoop Cluster.
             * @param conf {@link Configuration} for the cluster
             * @throws IOException
             */
            public static void setupCluster(Configuration conf) throws IOException;
            
            /**
             * Tear down the Hadoop Cluster
             * @param conf {@link Configuration} for the cluster
             * @throws IOException
             */
            public static void tearDownCluster(Configuration conf) throws IOException;
          
            /**
             * Kill all Hadoop Daemons running on the given rack.
             * @param rackId rack on which all map-reduce daemons should be killed
             * @throws IOException
             * @throws InterruptedException
             */
            public static void killRack(Cluster cluster, String rackId) 
            throws IOException, InterruptedException;
          
            /**
             * Fetch logs from the hadoop daemon from <code>startTime</code> to 
             * <code>endTime</code> and place them in <code>dst</code>.
             * @param cluster Map-Reduce {@link Cluster}
             * @param daemon hadoop daemon from which to fetch logs
             * @param startTime start time
             * @param endTime end time
             * @param dst destination for storing fetched logs
             * @throws IOException
             */
            public static void fetchDaemonLogs(Cluster cluster, Testable daemon, 
                                               long startTime, long endTime, 
                                               Path dst) 
            throws IOException;
          
            /**
             * Fetch deamon logs and check if they have the <code>pattern</code>.
             * @param cluster map-reduce <code>Cluster</code>
             * @param source log source
             * @param startTime start time
             * @param endTime end time
             * @param pattern pattern to check
             * @param fetch if <code>true</code> fetch the logs into <code>dir</code>,
             *              else do not fetch
             * @param dir directory to place the fetched logs
             * @return <code>true</code> if the logs contain <code>pattern</code>,
             *         <code>false</code> otherwise
             * @throws IOException
             */
            public static boolean checkDaemonLogs(Cluster cluster, 
                                                  LogSource source,
                                                  long startTime, long endTime,
                                                  String pattern,
                                                  boolean fetch, Path dir)
            throws IOException;
          
          

          It's very likely each of these utility methods will turn around and call shell-scripts etc. to actually accomplish the desired functionality... it's convenient to have the person implementing a specific test-case not worry about the details and continue to work in the familiar junit-environment (for hadoop devs).

          Show
          Arun C Murthy added a comment - Some utility apis to provide a flavour for what we are trying to accomplish: /** * Sources of logs and outputs. */ public enum LogSource { NAMENODE, DATANODE, JOBTRACKER, TASKTRACKER, TASK } /** * Setup a Hadoop Cluster. * @param conf {@link Configuration} for the cluster * @throws IOException */ public static void setupCluster(Configuration conf) throws IOException; /** * Tear down the Hadoop Cluster * @param conf {@link Configuration} for the cluster * @throws IOException */ public static void tearDownCluster(Configuration conf) throws IOException; /** * Kill all Hadoop Daemons running on the given rack. * @param rackId rack on which all map-reduce daemons should be killed * @throws IOException * @throws InterruptedException */ public static void killRack(Cluster cluster, String rackId) throws IOException, InterruptedException; /** * Fetch logs from the hadoop daemon from <code>startTime</code> to * <code>endTime</code> and place them in <code>dst</code>. * @param cluster Map-Reduce {@link Cluster} * @param daemon hadoop daemon from which to fetch logs * @param startTime start time * @param endTime end time * @param dst destination for storing fetched logs * @throws IOException */ public static void fetchDaemonLogs(Cluster cluster, Testable daemon, long startTime, long endTime, Path dst) throws IOException; /** * Fetch deamon logs and check if they have the <code>pattern</code>. * @param cluster map-reduce <code>Cluster</code> * @param source log source * @param startTime start time * @param endTime end time * @param pattern pattern to check * @param fetch if <code>true</code> fetch the logs into <code>dir</code>, * else do not fetch * @param dir directory to place the fetched logs * @return <code>true</code> if the logs contain <code>pattern</code>, * <code>false</code> otherwise * @throws IOException */ public static boolean checkDaemonLogs(Cluster cluster, LogSource source, long startTime, long endTime, String pattern, boolean fetch, Path dir) throws IOException; It's very likely each of these utility methods will turn around and call shell-scripts etc. to actually accomplish the desired functionality... it's convenient to have the person implementing a specific test-case not worry about the details and continue to work in the familiar junit-environment (for hadoop devs).

            People

            • Assignee:
              Konstantin Boudnik
              Reporter:
              Arun C Murthy
            • Votes:
              0 Vote for this issue
              Watchers:
              33 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development