Bigtop
  1. Bigtop
  2. BIGTOP-1032

Refactor hdfs/TestCLI and hdfs/TestAppend to be generic filesystem tests

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.7.0
    • Fix Version/s: backlog
    • Component/s: Tests
    • Labels:
      None

      Description

      The TestCLI and TestAppend classes both have hard references to an HDFS user shell, although most of their functionality is actually geared towards testing the abstract FileSystem interface.

      Removing a couple of lines from TestCLI:

      String[] createTestcliDirCmds =

      {"hadoop fs -mkdir -p " + TEST_DIR_ABSOLUTE, "hadoop fs -chmod 777 " + TEST_DIR_ABSOLUTE}

      ;
      shHDFS.exec(createTestcliDirCmds);

      And a this line from TestAppend:

      shHDFS.exec("hadoop fsck /user/$USERNAME/$testAppendInput/test2.file$date");

      This will essentially allow bigtop suites to be used to validate file system functionality in clusters with heterogenous https://wiki.apache.org/hadoop/HCFS/ implementations.

      1. FSCmdExecutor.java
        2 kB
        Martin Bukatovic
      2. TestCLI.java
        4 kB
        Martin Bukatovic

        Activity

        jay vyas created issue -
        Hide
        Konstantin Boudnik added a comment -

        If removing leads to the reduced coverage, then I am not a happy camper.

        Show
        Konstantin Boudnik added a comment - If removing leads to the reduced coverage, then I am not a happy camper.
        Hide
        jay vyas added a comment -

        Use runtime reflection to add hdfs specific checks, and
        For hcfs systems, run the generic tests.

        Or else, have different subclasses for different filesystems
        That implement specific functionality.

        I agree : decoupling shouldn't genericize the hdfs parts.

        Show
        jay vyas added a comment - Use runtime reflection to add hdfs specific checks, and For hcfs systems, run the generic tests. Or else, have different subclasses for different filesystems That implement specific functionality. I agree : decoupling shouldn't genericize the hdfs parts.
        Hide
        Konstantin Boudnik added a comment -

        I prefer the second way: have a common functionality sitting in the parent and subclass as needed.

        Show
        Konstantin Boudnik added a comment - I prefer the second way: have a common functionality sitting in the parent and subclass as needed.
        Hide
        jay vyas added a comment -

        Fair enough. Any other thoughts on this? It seems essentially like we can now work on a patch

        Show
        jay vyas added a comment - Fair enough. Any other thoughts on this? It seems essentially like we can now work on a patch
        Hide
        Martin Bukatovic added a comment -

        I went through the code of hdfs testaces and it seem to me that there are more
        changes needed. I propose the following:

        1) Change TestCLI.java (and FSCmdExecutor.java which is used by TestCLI) to use
        hadoop executable (/usr/lib/hadoop/bin/hadoop) instead of using
        org.apache.hadoop.fs.FsShell directly. The choice of which hadoop filesystem
        will be used can be done via setting few system properties. This way we
        solve several problems:

        • fs specific configuration doesn't need to be included in the TestCLI
          code, we assume that these tests are run on properly configured hadoop
          cluster.
        • there is zero additional work (besides specification of the properties)
          to run this test with any hadoop compatible fs. Eg. with current state, to
          make it run with glusterfs, one would have to rewrite the test code and
          edit pom.xml for classpath to include jar with glusterfs implementation
          (which is reduntant as you already specified this in core-site.xml).
        • and last but not least using command line is more appropriate for this
          kind of testing anyway

        My first attempt how this may look like is attached - it's not a patch, I
        changed FSCmdExecutor.java and TestCLI.java - and it's not fully working
        code yet.

        2) Most of the other tests are HDFS specific and as such should not be run on
        hadoop pools using alternative hadoop compatible filesystems:

        • TestDFSAdmin.groovy
        • TestFsck.groovy
        • TestFuseDFS.groovy
        • TestHDFSBalancer.groovy
        • TestHDFSQuota.groovy

        On the other hand TestDistCpIntra.groovy, TestFileAppend.groovy and
        TestTextSnappy.groovy may be executed even on non HDFS pool, but may
        require minor updates - I didn't try this yet though).

        To make it work with any HCFS, I would create 2 JUnit groups, one for HDFS
        specific tests, the other for HCFS tests - but I'm not sure how this would
        fit into current workflow. Any better ideas here?

        Does this whole idea make sense for you? What would you change?

        Show
        Martin Bukatovic added a comment - I went through the code of hdfs testaces and it seem to me that there are more changes needed. I propose the following: 1) Change TestCLI.java (and FSCmdExecutor.java which is used by TestCLI) to use hadoop executable (/usr/lib/hadoop/bin/hadoop) instead of using org.apache.hadoop.fs.FsShell directly. The choice of which hadoop filesystem will be used can be done via setting few system properties. This way we solve several problems: fs specific configuration doesn't need to be included in the TestCLI code, we assume that these tests are run on properly configured hadoop cluster. there is zero additional work (besides specification of the properties) to run this test with any hadoop compatible fs. Eg. with current state, to make it run with glusterfs, one would have to rewrite the test code and edit pom.xml for classpath to include jar with glusterfs implementation (which is reduntant as you already specified this in core-site.xml). and last but not least using command line is more appropriate for this kind of testing anyway My first attempt how this may look like is attached - it's not a patch, I changed FSCmdExecutor.java and TestCLI.java - and it's not fully working code yet. 2) Most of the other tests are HDFS specific and as such should not be run on hadoop pools using alternative hadoop compatible filesystems: TestDFSAdmin.groovy TestFsck.groovy TestFuseDFS.groovy TestHDFSBalancer.groovy TestHDFSQuota.groovy On the other hand TestDistCpIntra.groovy, TestFileAppend.groovy and TestTextSnappy.groovy may be executed even on non HDFS pool, but may require minor updates - I didn't try this yet though). To make it work with any HCFS, I would create 2 JUnit groups, one for HDFS specific tests, the other for HCFS tests - but I'm not sure how this would fit into current workflow. Any better ideas here? Does this whole idea make sense for you? What would you change?
        Hide
        Martin Bukatovic added a comment -

        example code mentioned in my comment

        Show
        Martin Bukatovic added a comment - example code mentioned in my comment
        Martin Bukatovic made changes -
        Field Original Value New Value
        Attachment FSCmdExecutor.java [ 12603059 ]
        Attachment TestCLI.java [ 12603060 ]
        Hide
        jay vyas added a comment -

        1) Regarding the Architecture this makes sense to me: IT will be super nice to modularize the tests, as this is a very complicated and long test suite . And bigtop has the advantage of moving faster than hadoop, so it can layer over the tests with more modular logic without having to directly modify them too much.

        2) Usefullness: And also To confirm that this is possible to control through maven alone we can see the excludeGroups option: http://maven.apache.org/surefire/maven-failsafe-plugin/integration-test-mojo.html#excludedGroups .

        3) Implementation: A minor suggestion : Instead of System.getProperty (in TestCLI.java above") . Why not just read from HADOOP_CONF_DIR which should be set in any bigtop test? That would follow the idiom that all the other smokes/ follow.

        And by the way Martin Bukatovic if you want to put your code in a github fork it might be easier for us to monitor your progres on this until your patch is ready

        THANKS ! Can't wait to use these modularized hdfs vs hcfs tests on our clusters.

        Show
        jay vyas added a comment - 1) Regarding the Architecture this makes sense to me: IT will be super nice to modularize the tests, as this is a very complicated and long test suite . And bigtop has the advantage of moving faster than hadoop, so it can layer over the tests with more modular logic without having to directly modify them too much. 2) Usefullness: And also To confirm that this is possible to control through maven alone we can see the excludeGroups option: http://maven.apache.org/surefire/maven-failsafe-plugin/integration-test-mojo.html#excludedGroups . 3) Implementation: A minor suggestion : Instead of System.getProperty (in TestCLI.java above") . Why not just read from HADOOP_CONF_DIR which should be set in any bigtop test? That would follow the idiom that all the other smokes/ follow. And by the way Martin Bukatovic if you want to put your code in a github fork it might be easier for us to monitor your progres on this until your patch is ready THANKS ! Can't wait to use these modularized hdfs vs hcfs tests on our clusters.
        Hide
        Martin Bukatovic added a comment -

        Thank you for the feedback!

        I created the fork as suggested, so you can find the changes in this branch:
        https://github.com/mbukatov/bigtop/tree/BIGTOP-1032

        Regarding 3) - you are right here, I just reused what the previous version used, but will definitely address that.

        Show
        Martin Bukatovic added a comment - Thank you for the feedback! I created the fork as suggested, so you can find the changes in this branch: https://github.com/mbukatov/bigtop/tree/BIGTOP-1032 Regarding 3) - you are right here, I just reused what the previous version used, but will definitely address that.
        Konstantin Boudnik made changes -
        Fix Version/s backlog [ 12324373 ]
        Konstantin Boudnik made changes -
        Affects Version/s 0.7.0 [ 12324362 ]
        Konstantin Boudnik made changes -
        Component/s Tests [ 12315617 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            jay vyas
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development