Bigtop
  1. Bigtop
  2. BIGTOP-1032

Refactor hdfs/TestCLI and hdfs/TestAppend to be generic filesystem tests

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 0.7.0
    • Fix Version/s: backlog
    • Component/s: tests
    • Labels:
      None

      Description

      The TestCLI and TestAppend classes both have hard references to an HDFS user shell, although most of their functionality is actually geared towards testing the abstract FileSystem interface.

      Removing a couple of lines from TestCLI:

      String[] createTestcliDirCmds =

      {"hadoop fs -mkdir -p " + TEST_DIR_ABSOLUTE, "hadoop fs -chmod 777 " + TEST_DIR_ABSOLUTE}

      ;
      shHDFS.exec(createTestcliDirCmds);

      And a this line from TestAppend:

      shHDFS.exec("hadoop fsck /user/$USERNAME/$testAppendInput/test2.file$date");

      This will essentially allow bigtop suites to be used to validate file system functionality in clusters with heterogenous https://wiki.apache.org/hadoop/HCFS/ implementations.

      1. TestCLI.java
        4 kB
        Martin Bukatovic
      2. FSCmdExecutor.java
        2 kB
        Martin Bukatovic

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          333d 12h 33m 1 Martin Bukatovic 18/Jun/14 10:00
          Martin Bukatovic made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Duplicate [ 3 ]
          Hide
          Martin Bukatovic added a comment - - edited

          And closing as duplicate.

          Show
          Martin Bukatovic added a comment - - edited And closing as duplicate.
          Hide
          Martin Bukatovic added a comment -

          Ok, flagging this issue to be resolved in BIGTOP-1342 instead.

          Show
          Martin Bukatovic added a comment - Ok, flagging this issue to be resolved in BIGTOP-1342 instead.
          Martin Bukatovic made changes -
          Link This issue is superceded by BIGTOP-1342 [ BIGTOP-1342 ]
          Konstantin Boudnik made changes -
          Assignee Martin Bukatovic [ mbukatov ]
          Hide
          jay vyas added a comment -

          Sure! I guess just flag the link as "superseded by" your new jira and set to closed

          Show
          jay vyas added a comment - Sure! I guess just flag the link as "superseded by" your new jira and set to closed
          Hide
          Martin Bukatovic added a comment -

          I have figured out how to run TestCLI with another hadoop filesystem, but the approach is quite different compared to what has been described there so far. I would personally prefer to close this JIRA and open a new one with new description of the proposed changes to make the discussion about the changes easier to track and follow. Would that be ok with you?

          Show
          Martin Bukatovic added a comment - I have figured out how to run TestCLI with another hadoop filesystem, but the approach is quite different compared to what has been described there so far. I would personally prefer to close this JIRA and open a new one with new description of the proposed changes to make the discussion about the changes easier to track and follow. Would that be ok with you?
          jay vyas made changes -
          Link This issue relates to BIGTOP-1321 [ BIGTOP-1321 ]
          Hide
          jay vyas added a comment -

          it appears that other errors running tests can occur when we expect to be able to launch a shell as user hdfs on a running system.

          Show
          jay vyas added a comment - it appears that other errors running tests can occur when we expect to be able to launch a shell as user hdfs on a running system.
          jay vyas made changes -
          Link This issue is related to BIGTOP-1307 [ BIGTOP-1307 ]
          Konstantin Boudnik made changes -
          Component/s Tests [ 12315617 ]
          Konstantin Boudnik made changes -
          Affects Version/s 0.7.0 [ 12324362 ]
          Konstantin Boudnik made changes -
          Fix Version/s backlog [ 12324373 ]
          Hide
          Martin Bukatovic added a comment -

          Thank you for the feedback!

          I created the fork as suggested, so you can find the changes in this branch:
          https://github.com/mbukatov/bigtop/tree/BIGTOP-1032

          Regarding 3) - you are right here, I just reused what the previous version used, but will definitely address that.

          Show
          Martin Bukatovic added a comment - Thank you for the feedback! I created the fork as suggested, so you can find the changes in this branch: https://github.com/mbukatov/bigtop/tree/BIGTOP-1032 Regarding 3) - you are right here, I just reused what the previous version used, but will definitely address that.
          Hide
          jay vyas added a comment -

          1) Regarding the Architecture this makes sense to me: IT will be super nice to modularize the tests, as this is a very complicated and long test suite . And bigtop has the advantage of moving faster than hadoop, so it can layer over the tests with more modular logic without having to directly modify them too much.

          2) Usefullness: And also To confirm that this is possible to control through maven alone we can see the excludeGroups option: http://maven.apache.org/surefire/maven-failsafe-plugin/integration-test-mojo.html#excludedGroups .

          3) Implementation: A minor suggestion : Instead of System.getProperty (in TestCLI.java above") . Why not just read from HADOOP_CONF_DIR which should be set in any bigtop test? That would follow the idiom that all the other smokes/ follow.

          And by the way Martin Bukatovic if you want to put your code in a github fork it might be easier for us to monitor your progres on this until your patch is ready

          THANKS ! Can't wait to use these modularized hdfs vs hcfs tests on our clusters.

          Show
          jay vyas added a comment - 1) Regarding the Architecture this makes sense to me: IT will be super nice to modularize the tests, as this is a very complicated and long test suite . And bigtop has the advantage of moving faster than hadoop, so it can layer over the tests with more modular logic without having to directly modify them too much. 2) Usefullness: And also To confirm that this is possible to control through maven alone we can see the excludeGroups option: http://maven.apache.org/surefire/maven-failsafe-plugin/integration-test-mojo.html#excludedGroups . 3) Implementation: A minor suggestion : Instead of System.getProperty (in TestCLI.java above") . Why not just read from HADOOP_CONF_DIR which should be set in any bigtop test? That would follow the idiom that all the other smokes/ follow. And by the way Martin Bukatovic if you want to put your code in a github fork it might be easier for us to monitor your progres on this until your patch is ready THANKS ! Can't wait to use these modularized hdfs vs hcfs tests on our clusters.
          Martin Bukatovic made changes -
          Field Original Value New Value
          Attachment FSCmdExecutor.java [ 12603059 ]
          Attachment TestCLI.java [ 12603060 ]
          Hide
          Martin Bukatovic added a comment -

          example code mentioned in my comment

          Show
          Martin Bukatovic added a comment - example code mentioned in my comment
          Hide
          Martin Bukatovic added a comment -

          I went through the code of hdfs testaces and it seem to me that there are more
          changes needed. I propose the following:

          1) Change TestCLI.java (and FSCmdExecutor.java which is used by TestCLI) to use
          hadoop executable (/usr/lib/hadoop/bin/hadoop) instead of using
          org.apache.hadoop.fs.FsShell directly. The choice of which hadoop filesystem
          will be used can be done via setting few system properties. This way we
          solve several problems:

          • fs specific configuration doesn't need to be included in the TestCLI
            code, we assume that these tests are run on properly configured hadoop
            cluster.
          • there is zero additional work (besides specification of the properties)
            to run this test with any hadoop compatible fs. Eg. with current state, to
            make it run with glusterfs, one would have to rewrite the test code and
            edit pom.xml for classpath to include jar with glusterfs implementation
            (which is reduntant as you already specified this in core-site.xml).
          • and last but not least using command line is more appropriate for this
            kind of testing anyway

          My first attempt how this may look like is attached - it's not a patch, I
          changed FSCmdExecutor.java and TestCLI.java - and it's not fully working
          code yet.

          2) Most of the other tests are HDFS specific and as such should not be run on
          hadoop pools using alternative hadoop compatible filesystems:

          • TestDFSAdmin.groovy
          • TestFsck.groovy
          • TestFuseDFS.groovy
          • TestHDFSBalancer.groovy
          • TestHDFSQuota.groovy

          On the other hand TestDistCpIntra.groovy, TestFileAppend.groovy and
          TestTextSnappy.groovy may be executed even on non HDFS pool, but may
          require minor updates - I didn't try this yet though).

          To make it work with any HCFS, I would create 2 JUnit groups, one for HDFS
          specific tests, the other for HCFS tests - but I'm not sure how this would
          fit into current workflow. Any better ideas here?

          Does this whole idea make sense for you? What would you change?

          Show
          Martin Bukatovic added a comment - I went through the code of hdfs testaces and it seem to me that there are more changes needed. I propose the following: 1) Change TestCLI.java (and FSCmdExecutor.java which is used by TestCLI) to use hadoop executable (/usr/lib/hadoop/bin/hadoop) instead of using org.apache.hadoop.fs.FsShell directly. The choice of which hadoop filesystem will be used can be done via setting few system properties. This way we solve several problems: fs specific configuration doesn't need to be included in the TestCLI code, we assume that these tests are run on properly configured hadoop cluster. there is zero additional work (besides specification of the properties) to run this test with any hadoop compatible fs. Eg. with current state, to make it run with glusterfs, one would have to rewrite the test code and edit pom.xml for classpath to include jar with glusterfs implementation (which is reduntant as you already specified this in core-site.xml). and last but not least using command line is more appropriate for this kind of testing anyway My first attempt how this may look like is attached - it's not a patch, I changed FSCmdExecutor.java and TestCLI.java - and it's not fully working code yet. 2) Most of the other tests are HDFS specific and as such should not be run on hadoop pools using alternative hadoop compatible filesystems: TestDFSAdmin.groovy TestFsck.groovy TestFuseDFS.groovy TestHDFSBalancer.groovy TestHDFSQuota.groovy On the other hand TestDistCpIntra.groovy, TestFileAppend.groovy and TestTextSnappy.groovy may be executed even on non HDFS pool, but may require minor updates - I didn't try this yet though). To make it work with any HCFS, I would create 2 JUnit groups, one for HDFS specific tests, the other for HCFS tests - but I'm not sure how this would fit into current workflow. Any better ideas here? Does this whole idea make sense for you? What would you change?
          Hide
          jay vyas added a comment -

          Fair enough. Any other thoughts on this? It seems essentially like we can now work on a patch

          Show
          jay vyas added a comment - Fair enough. Any other thoughts on this? It seems essentially like we can now work on a patch
          Hide
          Konstantin Boudnik added a comment -

          I prefer the second way: have a common functionality sitting in the parent and subclass as needed.

          Show
          Konstantin Boudnik added a comment - I prefer the second way: have a common functionality sitting in the parent and subclass as needed.
          Hide
          jay vyas added a comment -

          Use runtime reflection to add hdfs specific checks, and
          For hcfs systems, run the generic tests.

          Or else, have different subclasses for different filesystems
          That implement specific functionality.

          I agree : decoupling shouldn't genericize the hdfs parts.

          Show
          jay vyas added a comment - Use runtime reflection to add hdfs specific checks, and For hcfs systems, run the generic tests. Or else, have different subclasses for different filesystems That implement specific functionality. I agree : decoupling shouldn't genericize the hdfs parts.
          Hide
          Konstantin Boudnik added a comment -

          If removing leads to the reduced coverage, then I am not a happy camper.

          Show
          Konstantin Boudnik added a comment - If removing leads to the reduced coverage, then I am not a happy camper.
          jay vyas created issue -

            People

            • Assignee:
              Martin Bukatovic
              Reporter:
              jay vyas
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development