Uploaded image for project: 'Bigtop'
  1. Bigtop
  2. BIGTOP-1342

Make TestCLI usable for both HDFS and HCFS

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: tests
    • Labels:
      None

      Description

      Current TestCLI test cases are currently only runnable on HDFS. Since the most
      test cases are applicable on any hadoop filesystem, it makes sense to
      make it general in hcfs sense.

      While most test cases are hcfs generic, some cases are only applicable to HDFS,
      so I propose split the current code into:

      • general HCFS superclass (with most cases in testHCFSConf.xml file)
      • HDFS specific subclass (with hdfs only cases in testHDFSConf.xml)

      I would like to keep testHCFSConf.xml as a common base for any hadoop
      filesystem, which would require introduction of several additional variables to
      catch minor differences between GlusterFS and HDFS. This should be good enough
      for other hcfs implementations as well, but I didn't tested it.

      Before proposing patch for this, it make sense to have the following resolved:

      1. BIGTOP-1342.1.patch
        1.88 MB
        Martin Bukatovic
      2. BIGTOP-1342.2.patch
        1.26 MB
        Martin Bukatovic

        Issue Links

          Activity

          Hide
          mbukatov Martin Bukatovic added a comment -

          Flagging prerequisites properly.

          Show
          mbukatov Martin Bukatovic added a comment - Flagging prerequisites properly.
          Hide
          mbukatov Martin Bukatovic added a comment -

          Attaching initial patch for comments.

          Show
          mbukatov Martin Bukatovic added a comment - Attaching initial patch for comments.
          Hide
          mbukatov Martin Bukatovic added a comment - - edited

          The idea behind initial patch (BIGTOP-1342.1.patch):

          This patch expects that patch proposed in BIGTOP-1341 is applied.

          Classess TestCLI.java and FSCmdExecutor.java were changed to be hcfs
          compliant and moved from hdfs into hcfs module. The full class name
          of TestCLI is org.apache.bigtop.itest.hadoop.hcfs.TestCLI now.

          Being in hcfs module TestCLI can be used to test any Hadoop filesystem like
          HDFS or GlusterFS (no other filesystem were checked though). That said, HDFS is
          still a primary filesystem for TestCLI, which means that you can run TestCLI on
          HDFS and it would work without any additional configuration. On the other hand
          to run TestCLI on GlusterFS, you need to set several properties I have
          introduced (you don't need to this with HDFS because TestCLI has hardcoded HDFS
          specific defaults for all of them). TestCLI has it's test cases defined in
          testHCFSConf.xml.

          Since some test cases are HDFS specific (not applicable to any other Hadoop
          fs), I moved those test cases into separate file testHDFSConf.xml and added
          TestHDFSCLI into hdfs module (subclass of TestCLI). There are just few test
          cases in this category, but I expect that all the dfsadmin ones from
          BIGTOP-1334 would end up here.

          The idea is that with HDFS, you would run both TestCLI and TestHDFSCLI, while
          with GlusterFS (on any other hadoop compatible fs), you would run just TestCLI.

          Changes of TestCLI to make it hcfs ready

          • HCFS fs.default.name Hack (see TestCLI.java source to more details)
          • added property hcfs.namenode.pattern: regexp to match namenode
            (sometimes TestCLI expected to see namenode hostname with port number in
            the output, but this is not applicable to other filesystems)
          • added property hcfs.root.groupname: root group
            (root group is defined in Hadoop config dfs.permissions.superusergroup
            for HDFS, while it's always just root for GlusterFS)
          • added property hcfs.dirsize.pattern: regexp to match expected dir size
            (this is because HDFS reports zero for size of directories while GlusterFS
            doesn't)
          • added property hcfs.scheme: defines fs scheme (eg. "hdfs:", * "glusterfs:")
          • removed few HDFS specific cases from testHCFSConf.xml (moved to
            testHDFSConf.xml)

          Note: for GlusterFS test runs, I use the following values:

            -Dhcfs.root.username=root
            -Dhcfs.root.groupname=root
            -Dhcfs.scheme=glusterfs:
            -Dhcfs.dirsize.pattern='[0-9]+'
            -Dhcfs.namenode.pattern='' 
          
          Show
          mbukatov Martin Bukatovic added a comment - - edited The idea behind initial patch ( BIGTOP-1342 .1.patch ) : This patch expects that patch proposed in BIGTOP-1341 is applied. Classess TestCLI.java and FSCmdExecutor.java were changed to be hcfs compliant and moved from hdfs into hcfs module. The full class name of TestCLI is org.apache.bigtop.itest.hadoop.hcfs.TestCLI now. Being in hcfs module TestCLI can be used to test any Hadoop filesystem like HDFS or GlusterFS (no other filesystem were checked though). That said, HDFS is still a primary filesystem for TestCLI, which means that you can run TestCLI on HDFS and it would work without any additional configuration. On the other hand to run TestCLI on GlusterFS, you need to set several properties I have introduced (you don't need to this with HDFS because TestCLI has hardcoded HDFS specific defaults for all of them). TestCLI has it's test cases defined in testHCFSConf.xml . Since some test cases are HDFS specific (not applicable to any other Hadoop fs), I moved those test cases into separate file testHDFSConf.xml and added TestHDFSCLI into hdfs module (subclass of TestCLI). There are just few test cases in this category, but I expect that all the dfsadmin ones from BIGTOP-1334 would end up here. The idea is that with HDFS, you would run both TestCLI and TestHDFSCLI, while with GlusterFS (on any other hadoop compatible fs), you would run just TestCLI. Changes of TestCLI to make it hcfs ready HCFS fs.default.name Hack (see TestCLI.java source to more details) added property hcfs.namenode.pattern : regexp to match namenode (sometimes TestCLI expected to see namenode hostname with port number in the output, but this is not applicable to other filesystems) added property hcfs.root.groupname : root group (root group is defined in Hadoop config dfs.permissions.superusergroup for HDFS, while it's always just root for GlusterFS) added property hcfs.dirsize.pattern : regexp to match expected dir size (this is because HDFS reports zero for size of directories while GlusterFS doesn't) added property hcfs.scheme : defines fs scheme (eg. "hdfs:", * "glusterfs:") removed few HDFS specific cases from testHCFSConf.xml (moved to testHDFSConf.xml ) Note: for GlusterFS test runs, I use the following values: -Dhcfs.root.username=root -Dhcfs.root.groupname=root -Dhcfs.scheme=glusterfs: -Dhcfs.dirsize.pattern='[0-9]+' -Dhcfs.namenode.pattern=''
          Hide
          mbukatov Martin Bukatovic added a comment -

          Flagging for review. Comments welcome

          Show
          mbukatov Martin Bukatovic added a comment - Flagging for review. Comments welcome
          Hide
          jayunit100 jay vyas added a comment -

          Thanks martin! I'll review this shortly and test it on hdfs for you as well.
          And in general I don't think dfs.permissions..... Should be required for non hdfs tests.

          Show
          jayunit100 jay vyas added a comment - Thanks martin! I'll review this shortly and test it on hdfs for you as well. And in general I don't think dfs.permissions..... Should be required for non hdfs tests.
          Hide
          jayunit100 jay vyas added a comment -

          Overall i looked at the code - it makes sense what you did.

          The XML stuff is brutally massive - thousands of lines or something.

          This will be tough to review. possibly an interactive review would be nice.

          Or maybe I can just spin up a vagrant cluster, apply the patch - and run the thing from start to finish.

          But in the meanwhile: Is there anyway you can reduce the amount of changes so that we can more easily review the commit.?

          Show
          jayunit100 jay vyas added a comment - Overall i looked at the code - it makes sense what you did. The XML stuff is brutally massive - thousands of lines or something. This will be tough to review. possibly an interactive review would be nice. Or maybe I can just spin up a vagrant cluster, apply the patch - and run the thing from start to finish. But in the meanwhile: Is there anyway you can reduce the amount of changes so that we can more easily review the commit.?
          Hide
          mbukatov Martin Bukatovic added a comment - - edited

          Oh, yes. You are right, the change of xml file is too big to review.

          To make this change little easier to consume, here you can see what I did with
          the test config file:

          • create HCFS test file:
            cp testHDFSConf.xml testHCFSConf.xml
            
          • introduce HCFS_SCHEME variable:
            sed -i 's/hdfs:/HCFS_SCHEME/g' testHCFSConf.xml
            
          • introduce HCFS_DIRSIZE variable:
            sed -i '/<expected-output>^drwx/s/0/HCFS_DIRSIZE/' testHCFSConf.xml
            
          • introduce HCFS_NNMATCH variable:
            sed -i 's!HCFS_SCHEME//\\w+\[-\.a-z0-9\]\*(:\[0-9\]+)?!HCFS_SCHEME//HCFS_NNMATCH!' testHCFSConf.xml
            
          • disable test cases not applicable to HCFS: commenting out cases: 242 294 295 296 297 301 302 305 306 in testHCFSConf.xml
          • removed all hcfs test cases from testHDFSConf.xml (testHCFSConf.xml contains just 242 294 295 296 297 301 302 305 306 testcases now)
          • left original version of test config file available as testHCFSConf.pristine.xml

          Also you can review the changes in my personal devel branch on github, which
          contains single patches for each change:
          https://github.com/mbukatov/bigtop/commits/local_BIGTOP-1342-devel

          Please note that this change expects BIGTOP-1341 to be resolved. I splitted the
          effort into 2 JIRAs because it would be hard to discuss such big change in
          singe JIRA, moreover I expedted that this hcfs patch will needs some tweaking,
          while TestCLI cleanup would be easier to get merged.

          Show
          mbukatov Martin Bukatovic added a comment - - edited Oh, yes. You are right, the change of xml file is too big to review. To make this change little easier to consume, here you can see what I did with the test config file: create HCFS test file: cp testHDFSConf.xml testHCFSConf.xml introduce HCFS_SCHEME variable: sed -i 's/hdfs:/HCFS_SCHEME/g' testHCFSConf.xml introduce HCFS_DIRSIZE variable: sed -i '/<expected-output>^drwx/s/0/HCFS_DIRSIZE/' testHCFSConf.xml introduce HCFS_NNMATCH variable: sed -i 's!HCFS_SCHEME//\\w+\[-\.a-z0-9\]\*(:\[0-9\]+)?!HCFS_SCHEME//HCFS_NNMATCH!' testHCFSConf.xml disable test cases not applicable to HCFS: commenting out cases: 242 294 295 296 297 301 302 305 306 in testHCFSConf.xml removed all hcfs test cases from testHDFSConf.xml ( testHCFSConf.xml contains just 242 294 295 296 297 301 302 305 306 testcases now) left original version of test config file available as testHCFSConf.pristine.xml Also you can review the changes in my personal devel branch on github, which contains single patches for each change: https://github.com/mbukatov/bigtop/commits/local_BIGTOP-1342-devel Please note that this change expects BIGTOP-1341 to be resolved. I splitted the effort into 2 JIRAs because it would be hard to discuss such big change in singe JIRA, moreover I expedted that this hcfs patch will needs some tweaking, while TestCLI cleanup would be easier to get merged.
          Hide
          jayunit100 jay vyas added a comment -

          Martin : Can I suggest you

          1) Do you think we should create a JIRA in Hadoop which indicates that there should be a superuser parameter in the hadoop common namespace ( shouldnt own the notion of a superuser other file systems have super users to !)

          2) Can we also suggest alternative way of defining superuser for this test ? I think its a little self-defeating to depend on an HDFS parameter when we're trying to make the tests HCFS compliant – but i totally see why you are using it - let me know if im missing something.

          3) Does this take into account the semantics we agreed on in the BIGTOP-1200 patch ( just a quick thought).

          Meanwhile - im still working on spinning up some sturdy vms for testing all this awesome new HCFS testing work.

          Thanks !

          Show
          jayunit100 jay vyas added a comment - Martin : Can I suggest you 1) Do you think we should create a JIRA in Hadoop which indicates that there should be a superuser parameter in the hadoop common namespace ( shouldnt own the notion of a superuser other file systems have super users to !) 2) Can we also suggest alternative way of defining superuser for this test ? I think its a little self-defeating to depend on an HDFS parameter when we're trying to make the tests HCFS compliant – but i totally see why you are using it - let me know if im missing something. 3) Does this take into account the semantics we agreed on in the BIGTOP-1200 patch ( just a quick thought). Meanwhile - im still working on spinning up some sturdy vms for testing all this awesome new HCFS testing work. Thanks !
          Hide
          mbukatov Martin Bukatovic added a comment -

          Do you think we should create a JIRA in Hadoop which indicates that there should be a superuser parameter in the hadoop common namespace ( shouldnt own the notion of a superuser other file systems have super users to !)

          So are you suggesting to move dfs.permissions.superusergroup from hdfs-site.xml into something like hcfs.permissions.superusergroup in let's say core-site.xml or even hcfs-site.xml? While I'm not sure if it would be easy to convince Hadoop upstream about this, creating JIRA for it in Hadoop is actually a good idea. This issue definitely deserves an official upstream solution applicable to any Hadoop filesystem. Do we have other dfs properties related to GlusterFS like this one?

          Can we also suggest alternative way of defining superuser for this test ? I think its a little self-defeating to depend on an HDFS parameter when we're trying to make the tests HCFS compliant – but i totally see why you are using it - let me know if im missing something.

          I'm using it because I need to keep HDFS compatibility, but you can redefine it using hcfs.root.groupname property. So I would say that the patch already provides alternative way of defining superuser. See the code:

          supergroup = System.getProperty("hcfs.root.groupname", conf.get(DFSConfigKeys.DFS_PERMISSIONS_SUPERUSERGROUP_KEY));
          

          Does this take into account the semantics we agreed on in the BIGTOP-1200 patch ( just a quick thought).

          Well I expect that superuser and supergroup for GlusterFS run is root ( see properties -Dhcfs.root.username=root -Dhcfs.root.groupname=root from comment above). Or do you have something else to point out?

          Show
          mbukatov Martin Bukatovic added a comment - Do you think we should create a JIRA in Hadoop which indicates that there should be a superuser parameter in the hadoop common namespace ( shouldnt own the notion of a superuser other file systems have super users to !) So are you suggesting to move dfs.permissions.superusergroup from hdfs-site.xml into something like hcfs.permissions.superusergroup in let's say core-site.xml or even hcfs-site.xml ? While I'm not sure if it would be easy to convince Hadoop upstream about this, creating JIRA for it in Hadoop is actually a good idea. This issue definitely deserves an official upstream solution applicable to any Hadoop filesystem. Do we have other dfs properties related to GlusterFS like this one? Can we also suggest alternative way of defining superuser for this test ? I think its a little self-defeating to depend on an HDFS parameter when we're trying to make the tests HCFS compliant – but i totally see why you are using it - let me know if im missing something. I'm using it because I need to keep HDFS compatibility, but you can redefine it using hcfs.root.groupname property. So I would say that the patch already provides alternative way of defining superuser. See the code: supergroup = System.getProperty("hcfs.root.groupname", conf.get(DFSConfigKeys.DFS_PERMISSIONS_SUPERUSERGROUP_KEY)); Does this take into account the semantics we agreed on in the BIGTOP-1200 patch ( just a quick thought). Well I expect that superuser and supergroup for GlusterFS run is root ( see properties -Dhcfs.root.username=root -Dhcfs.root.groupname=root from comment above). Or do you have something else to point out?
          Hide
          jayunit100 jay vyas added a comment -

          1) As part of this ticket, i think its our duty then to create a JIRA for this in upstream hadoop - and debate there - what the solution of DFSConfigKeys.DFS_PERMISSIONS_SUPERGROUP_KEY is.

          2) FWIW + "root_user":"hdfs" was what we called it in BIGTOP-1200.

          3) Sorry i will get a bigtop vm spun up tomorrow to actually run this ! Have been very busy this week.

          Show
          jayunit100 jay vyas added a comment - 1) As part of this ticket, i think its our duty then to create a JIRA for this in upstream hadoop - and debate there - what the solution of DFSConfigKeys.DFS_PERMISSIONS_SUPERGROUP_KEY is. 2) FWIW + "root_user":"hdfs" was what we called it in BIGTOP-1200 . 3) Sorry i will get a bigtop vm spun up tomorrow to actually run this ! Have been very busy this week.
          Hide
          mbukatov Martin Bukatovic added a comment -

          As part of this ticket, i think its our duty then to create a JIRA for this in upstream hadoop - and debate there - what the solution of DFSConfigKeys.DFS_PERMISSIONS_SUPERGROUP_KEY is.

          Agreed, I have created HADOOP-10779 so that we can move the discussion there.

          FWIW + "root_user":"hdfs" was what we called it in BIGTOP-1200.

          Oh, yes. Would it be ok to rename hcfs.root.username property to hcfs.root_user and hcfs.root.groupname to hcfs.root_group then (since property name is expected to have some prefix)?

          Sorry i will get a bigtop vm spun up tomorrow to actually run this ! Have been very busy this week.

          No problem, the important thing is that we are moving forward.

          Show
          mbukatov Martin Bukatovic added a comment - As part of this ticket, i think its our duty then to create a JIRA for this in upstream hadoop - and debate there - what the solution of DFSConfigKeys.DFS_PERMISSIONS_SUPERGROUP_KEY is. Agreed, I have created HADOOP-10779 so that we can move the discussion there. FWIW + "root_user":"hdfs" was what we called it in BIGTOP-1200 . Oh, yes. Would it be ok to rename hcfs.root.username property to hcfs.root_user and hcfs.root.groupname to hcfs.root_group then (since property name is expected to have some prefix)? Sorry i will get a bigtop vm spun up tomorrow to actually run this ! Have been very busy this week. No problem, the important thing is that we are moving forward.
          Hide
          jayunit100 jay vyas added a comment -

          testing on this now that BIGTOP-1341 is in.

          Show
          jayunit100 jay vyas added a comment - testing on this now that BIGTOP-1341 is in.
          Hide
          jayunit100 jay vyas added a comment -

          Okay ! ... Functionally, it worked out of the box, 92% pass, 7% fail like in BIGTOP-1341.

          Now, there are some significant whitespace complaints, etc... when applying the patch.

          Martin Bukatovic Do you want to try and fix those whitespace errors and comment on the errors which are unfixable? ( I see errors for examplein patch lines 84, 547, 583, 605,638) when i run "git am BIGTOP-1342.1.patch" on Master.

          The way i do formatting of groovy code is here : https://issues.apache.org/jira/browse/BIGTOP-1240?focusedCommentId=13920396&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13920396.

          Anyways... once those are fixed, as far as i can tell, we can push this guy in — just in time for the 0.8.0 release, it ill be cool to announce that this release includes major HCFS improvements in the tests.

          Show
          jayunit100 jay vyas added a comment - Okay ! ... Functionally, it worked out of the box, 92% pass, 7% fail like in BIGTOP-1341 . Now, there are some significant whitespace complaints, etc... when applying the patch. Martin Bukatovic Do you want to try and fix those whitespace errors and comment on the errors which are unfixable? ( I see errors for examplein patch lines 84, 547, 583, 605,638) when i run "git am BIGTOP-1342 .1.patch" on Master. The way i do formatting of groovy code is here : https://issues.apache.org/jira/browse/BIGTOP-1240?focusedCommentId=13920396&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13920396 . Anyways... once those are fixed, as far as i can tell, we can push this guy in — just in time for the 0.8.0 release, it ill be cool to announce that this release includes major HCFS improvements in the tests.
          Hide
          mbukatov Martin Bukatovic added a comment -

          Ok, attaching new patch BIGTOP-1342.2.patch with the following changes:

          • Removed bigtop-tests/test-artifacts/hadoop/src/main/resources/clitest_data/testHDFSConf.pristine.xml file (the original xml file, stored just for reference, not directly used from the code)
          • Fixed whitespace warnings: removed trailing whitespace with sed and rebased (via squash) the result with the first patch
            sed -i 's/[\t\ ]*$//' testHDFSConf.xml
            

            This means that if you compare 2nd path with the first one, you will see lot of whitespace changes.

          • Updated commit message.

          So I'm declaring this path final, but before commiting into master, please recheck that updating maven pom files is not needed (I think that it should not be necessary, but I would not call myself a maven expert).

          Show
          mbukatov Martin Bukatovic added a comment - Ok, attaching new patch BIGTOP-1342 .2.patch with the following changes: Removed bigtop-tests/test-artifacts/hadoop/src/main/resources/clitest_data/testHDFSConf.pristine.xml file (the original xml file, stored just for reference, not directly used from the code) Fixed whitespace warnings: removed trailing whitespace with sed and rebased (via squash) the result with the first patch sed -i 's/[\t\ ]*$//' testHDFSConf.xml This means that if you compare 2nd path with the first one, you will see lot of whitespace changes. Updated commit message. So I'm declaring this path final, but before commiting into master, please recheck that updating maven pom files is not needed (I think that it should not be necessary, but I would not call myself a maven expert).
          Hide
          jayunit100 jay vyas added a comment -

          For those busy to read these changes, ill summarize here:

          1) First, we Add changes +/- to FSCmdExecutor.java. That is because we move the package.
          2) Then, we see a similar thing with TestCLI (the new HCFS compliant test class)
          3) The tests which are HDFS Specific are now in TestHDFSCli
          4) There are now 2 XML files (HCFSConf.xml and HDFSConf.xml)

          Overall this is a good patch and a very clean solution IMO : We have now peeled out HDFS Specific operations entirely, so anyone with a HCFS can do really rigorous testing of their file system.

          I'll do full review and test again tonite. until then, any objections to commit if its all working?

          Show
          jayunit100 jay vyas added a comment - For those busy to read these changes, ill summarize here: 1) First, we Add changes +/- to FSCmdExecutor.java. That is because we move the package. 2) Then, we see a similar thing with TestCLI (the new HCFS compliant test class) 3) The tests which are HDFS Specific are now in TestHDFSCli 4) There are now 2 XML files (HCFSConf.xml and HDFSConf.xml) Overall this is a good patch and a very clean solution IMO : We have now peeled out HDFS Specific operations entirely, so anyone with a HCFS can do really rigorous testing of their file system. I'll do full review and test again tonite. until then, any objections to commit if its all working?
          Hide
          jayunit100 jay vyas added a comment -

          +1, tested on hdfs, and commited. thanks martin !!!

          *TestCLI is now runs on any Hadoop Compatible FileSystem*

          Show
          jayunit100 jay vyas added a comment - +1, tested on hdfs, and commited. thanks martin !!! * TestCLI is now runs on any Hadoop Compatible FileSystem *

            People

            • Assignee:
              mbukatov Martin Bukatovic
              Reporter:
              mbukatov Martin Bukatovic
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development