Uploaded image for project: 'Bigtop'
  1. Bigtop
  2. BIGTOP-1748

Remove assert on fs.trash.interval from TestCLI and TestDFSCLI

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.8.0
    • Fix Version/s: 1.0.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently, the following assert causes these CLI tests (and TestHDFSCLI, which extends TestCLI) to fail on clusters where this configuration is set otherwise:

      Assert.assertEquals("HDFS trash should be disabled via fs.trash.interval",
              0, conf.getInt("fs.trash.interval", 0));

      I believe this isn't necessary for these tests, so let's remove it.

      1. BIGTOP-1748.patch
        3 kB
        Dasha Boudnik

        Issue Links

          Activity

          Hide
          dasha.boudnik Dasha Boudnik added a comment -

          Patch attached.

          Show
          dasha.boudnik Dasha Boudnik added a comment - Patch attached.
          Hide
          dasha.boudnik Dasha Boudnik added a comment -

          Martin Bukatovic, just looked and saw you're the one who introduced this assert. What was your reasoning? I may be missing something...

          Show
          dasha.boudnik Dasha Boudnik added a comment - Martin Bukatovic , just looked and saw you're the one who introduced this assert. What was your reasoning? I may be missing something...
          Hide
          jayunit100 jay vyas added a comment - - edited

          Hi dasha. im not sure why this happened but just for some context, martin works on HCFS stuff, trying to make sure in general that tests arent specific to HDFS.

          Is there any reason why HDFS trash settings to a positive value (which would result in, i guess, ? extra storage in some directories) would break ability to run the general HCFS tests? Possibly, since "trash" is not an offical part of the HCFS specifications (i.e. https://issues.apache.org/jira/secure/attachment/12572328/HadoopFilesystemContract.pdf )...

          UPDATE
          If zero, the trash feature is disabled. so my suspicion that this was an attempt to force disablement of the feature is likely correct... but not sure why? we will have to look more.

          Show
          jayunit100 jay vyas added a comment - - edited Hi dasha. im not sure why this happened but just for some context, martin works on HCFS stuff, trying to make sure in general that tests arent specific to HDFS. Is there any reason why HDFS trash settings to a positive value (which would result in, i guess, ? extra storage in some directories) would break ability to run the general HCFS tests? Possibly, since "trash" is not an offical part of the HCFS specifications (i.e. https://issues.apache.org/jira/secure/attachment/12572328/HadoopFilesystemContract.pdf )... UPDATE If zero, the trash feature is disabled. so my suspicion that this was an attempt to force disablement of the feature is likely correct... but not sure why? we will have to look more.
          Hide
          jayunit100 jay vyas added a comment -

          Another UPDATE also, looks like the comment sais there is another reason, which is that these tests are broken if the value is pre-set on the server ?

          Show
          jayunit100 jay vyas added a comment - Another UPDATE also, looks like the comment sais there is another reason, which is that these tests are broken if the value is pre-set on the server ?
          Hide
          dasha.boudnik Dasha Boudnik added a comment -

          jay vyas, if I remember correctly, the reasoning was that all non HDFS-specific tests will be included in TestCLI, and HDFS-specific tests will be in TestDFSCLI/TestHDFSCLI. So TestCLI includes HDFS tests, too. I may be wrong, but I think that's what we'd decided to do.

          Check this out: https://issues.apache.org/jira/browse/BIGTOP-1334

          Show
          dasha.boudnik Dasha Boudnik added a comment - jay vyas , if I remember correctly, the reasoning was that all non HDFS-specific tests will be included in TestCLI, and HDFS-specific tests will be in TestDFSCLI/TestHDFSCLI. So TestCLI includes HDFS tests, too. I may be wrong, but I think that's what we'd decided to do. Check this out: https://issues.apache.org/jira/browse/BIGTOP-1334
          Hide
          cos Konstantin Boudnik added a comment -

          I believe disabling trash interval isn't how HDFS clusters are configured in the wild. If this is something that breaks HCFS test runs we should make it less demanding? Perhaps making a special case of trash related tests? I am not sure...

          Show
          cos Konstantin Boudnik added a comment - I believe disabling trash interval isn't how HDFS clusters are configured in the wild. If this is something that breaks HCFS test runs we should make it less demanding? Perhaps making a special case of trash related tests? I am not sure...
          Hide
          cos Konstantin Boudnik added a comment -

          You might be right - I don't remember all the details now, but perhaps it'd make sense to remove this assertion from DFSCLI tests as they won't be executed on HCFS systems?

          Show
          cos Konstantin Boudnik added a comment - You might be right - I don't remember all the details now, but perhaps it'd make sense to remove this assertion from DFSCLI tests as they won't be executed on HCFS systems?
          Hide
          dasha.boudnik Dasha Boudnik added a comment -

          Should we maybe consider breaking the tests up by which fs they use? We keep running into issues because of HCFS vs HDFS and configurations and all that...would it maybe make sense to just separate them entirely, rather than only keeping the HDFS-specific tests separate?

          Show
          dasha.boudnik Dasha Boudnik added a comment - Should we maybe consider breaking the tests up by which fs they use? We keep running into issues because of HCFS vs HDFS and configurations and all that...would it maybe make sense to just separate them entirely, rather than only keeping the HDFS-specific tests separate?
          Hide
          cos Konstantin Boudnik added a comment -

          If by that you mean having an HCFS-sensitive tests in a separate test suite that will enforce special configurations perhaps? Something like having somewhat different driver (TestHCFSCLI ??) with its own set of asserts and assumptions, although it might use the same xml configuration files?

          Show
          cos Konstantin Boudnik added a comment - If by that you mean having an HCFS-sensitive tests in a separate test suite that will enforce special configurations perhaps? Something like having somewhat different driver (TestHCFSCLI ??) with its own set of asserts and assumptions, although it might use the same xml configuration files?
          Hide
          dasha.boudnik Dasha Boudnik added a comment -

          Perhaps. Though I was also thinking of separating the xml files, they seem to be getting out of hand a bit...

          Show
          dasha.boudnik Dasha Boudnik added a comment - Perhaps. Though I was also thinking of separating the xml files, they seem to be getting out of hand a bit...
          Hide
          cos Konstantin Boudnik added a comment -

          Are they getting out of hands because of the HDFS/HCFS differences? Or there is some other factor(s) involved?

          Show
          cos Konstantin Boudnik added a comment - Are they getting out of hands because of the HDFS/HCFS differences? Or there is some other factor(s) involved?
          Hide
          dasha.boudnik Dasha Boudnik added a comment -

          There's just a bunch of tests that are all thrown together, yeah. Maybe it's just me, but it's kinda annoying... If we're gonna separate how we run them, might as well separate where we run them from.

          Show
          dasha.boudnik Dasha Boudnik added a comment - There's just a bunch of tests that are all thrown together, yeah. Maybe it's just me, but it's kinda annoying... If we're gonna separate how we run them, might as well separate where we run them from.
          Hide
          cos Konstantin Boudnik added a comment -

          I think the best way would be come with the proposal on the new structure as a separate ticket or let's repurpose this one: either way is good with me.

          Show
          cos Konstantin Boudnik added a comment - I think the best way would be come with the proposal on the new structure as a separate ticket or let's repurpose this one: either way is good with me.
          Hide
          dasha.boudnik Dasha Boudnik added a comment -

          Cool. Moved this discussion over to BIGTOP-1750.

          Show
          dasha.boudnik Dasha Boudnik added a comment - Cool. Moved this discussion over to BIGTOP-1750 .
          Hide
          mbukatov Martin Bukatovic added a comment -

          The quick answer is that HDFS trash feature makes large number of TestCLI
          testcases fail. So I introduced this assert to make debugging easier: instead
          of inspecting seemingly random fails, the you can find the root cause of the
          issue right in the beginning of a log file.

          Why trash feature makes the testcases fail is a different story. TestCLI
          testcases are defined in large xml file which comes (with minor modifications)
          from vanilla Hadoop unit tests.

          For example here is the definition of one case:

          Unable to find source-code formatter for language: xml. Available languages are: actionscript, html, java, javascript, none, sql, xhtml, xml
          <test>
            <description>rm: removing a file (relative path) </description>           
            <test-commands>                                                           
              <command>-fs NAMENODE -mkdir -p dir</command> <!-- make sure user home dir exists -->
              <command>-fs NAMENODE -touchz file0</command>                           
              <command>-fs NAMENODE -rm -r file0</command>                            
            </test-commands>                                                          
            <cleanup-commands>                                                        
              <command>-fs NAMENODE -rm -r /user/USER_NAME/*</command>                
            </cleanup-commands>                                                       
            <comparators>                                                             
              <comparator>                                                            
                <type>RegexpComparator</type>                                         
                <expected-output>^Deleted file0</expected-output>                     
              </comparator>                                                           
            </comparators>                                                            
          </test>    
          

          It consists of list of hadoop fs commands and list of comparators. There is
          single regular expression in this case which is expected to match the output of
          hadoop fs commands.

          The problem is that when HDFS trash feature is enabled, the output is changed
          and the regular expression would not match any more:

                      Test Description: [rm: removing a file (relative path) ]
           
                         Test Commands: [-fs hdfs://dhcp-lab-203.englab.brq.redhat.com:8020 -mkdir -p dir]
                         Test Commands: [-fs hdfs://dhcp-lab-203.englab.brq.redhat.com:8020 -touchz file0]
                         Test Commands: [-fs hdfs://dhcp-lab-203.englab.brq.redhat.com:8020 -rm file0]
           
                      Cleanup Commands: [-fs hdfs://dhcp-lab-203.englab.brq.redhat.com:8020 -rm /user/bigtop/*]
           
                            Comparator: [RegexpComparator]
                    Comparision result:   [fail]
                       Expected output:   [^Deleted file0]
                         Actual output:   [Moved: 'hdfs://dhcp-lab-203.englab.brq.redhat.com:8020/user/bigtop/file0' to trash at: hdfs://dhcp-lab-203.englab.brq.redhat.com:8020/user/bigtop/.Trash/Current
          

          Note here that instead of expected output Deleted file0, the hadoop fs
          command moves the file into trash and changes the output accordingly.

          Bear in mind that this is just single example, there are more ways how HDFS
          trash breaks other tests (eg. file is expected to be deleted and the total
          size of home directory for given user is not zero ...). Moreover the number of
          testcases affected is rather large (at least this is as I remeber it to be,
          the logs of testruns which would help me to present real number here got
          rotated away into oblivion ...).

          Ok, and why is vanilla Hadoop unittests written in this way? Because it's easy
          to disable trash feature when you do unit testing of a hadoop library. But when
          you have a real cluster, it is not possible to disable trash feature on the
          client, this needs to be done on the HDFS server. Also note that this trash
          feature affects only hadoop fs commands (eg. hadoop fs -rm /user/foo/bar)
          and doesn't have any effect on any other access to the HDFS (eg. via HDFS API
          or during mapred job).

          Also I always considered that the main idea behind TestCLI tests is to easily
          and cheaply reuse unittests from vanilla Hadoop in real cluster enviroment.

          This all means that it would not be feasible to change the tests to make
          them work in both enviroments (with and without trash) because it would require
          complete rewrite of regexps so that it would match both cases (which is not
          even possible in some cases). Moreover it would make hard for us to easily accept
          changes from upstream Hadoop testcli cases as it would lose the quick and cheap
          reuse status.

          So my recommendation is to leave the assert here and either:

          • Disable the trash feature on the server entirely. Considering the impact
            which the feature has, it may be reasonable solution (Hortonworks does this
            for HDP distro in default configuration).
          • Or script the reconfiguration of the hadoop cluster to disable trash feature
            just for run of testcli.
          Show
          mbukatov Martin Bukatovic added a comment - The quick answer is that HDFS trash feature makes large number of TestCLI testcases fail. So I introduced this assert to make debugging easier: instead of inspecting seemingly random fails, the you can find the root cause of the issue right in the beginning of a log file. Why trash feature makes the testcases fail is a different story. TestCLI testcases are defined in large xml file which comes (with minor modifications) from vanilla Hadoop unit tests. For example here is the definition of one case: Unable to find source-code formatter for language: xml. Available languages are: actionscript, html, java, javascript, none, sql, xhtml, xml <test> <description>rm: removing a file (relative path) </description> <test-commands> <command>-fs NAMENODE -mkdir -p dir</command> <!-- make sure user home dir exists --> <command>-fs NAMENODE -touchz file0</command> <command>-fs NAMENODE -rm -r file0</command> </test-commands> <cleanup-commands> <command>-fs NAMENODE -rm -r /user/USER_NAME/*</command> </cleanup-commands> <comparators> <comparator> <type>RegexpComparator</type> <expected-output>^Deleted file0</expected-output> </comparator> </comparators> </test> It consists of list of hadoop fs commands and list of comparators. There is single regular expression in this case which is expected to match the output of hadoop fs commands. The problem is that when HDFS trash feature is enabled, the output is changed and the regular expression would not match any more: Test Description: [rm: removing a file (relative path) ] Test Commands: [-fs hdfs://dhcp-lab-203.englab.brq.redhat.com:8020 -mkdir -p dir] Test Commands: [-fs hdfs://dhcp-lab-203.englab.brq.redhat.com:8020 -touchz file0] Test Commands: [-fs hdfs://dhcp-lab-203.englab.brq.redhat.com:8020 -rm file0] Cleanup Commands: [-fs hdfs://dhcp-lab-203.englab.brq.redhat.com:8020 -rm /user/bigtop/*] Comparator: [RegexpComparator] Comparision result: [fail] Expected output: [^Deleted file0] Actual output: [Moved: 'hdfs://dhcp-lab-203.englab.brq.redhat.com:8020/user/bigtop/file0' to trash at: hdfs://dhcp-lab-203.englab.brq.redhat.com:8020/user/bigtop/.Trash/Current Note here that instead of expected output Deleted file0 , the hadoop fs command moves the file into trash and changes the output accordingly. Bear in mind that this is just single example, there are more ways how HDFS trash breaks other tests (eg. file is expected to be deleted and the total size of home directory for given user is not zero ...). Moreover the number of testcases affected is rather large (at least this is as I remeber it to be, the logs of testruns which would help me to present real number here got rotated away into oblivion ...). Ok, and why is vanilla Hadoop unittests written in this way? Because it's easy to disable trash feature when you do unit testing of a hadoop library. But when you have a real cluster, it is not possible to disable trash feature on the client, this needs to be done on the HDFS server. Also note that this trash feature affects only hadoop fs commands (eg. hadoop fs -rm /user/foo/bar ) and doesn't have any effect on any other access to the HDFS (eg. via HDFS API or during mapred job). Also I always considered that the main idea behind TestCLI tests is to easily and cheaply reuse unittests from vanilla Hadoop in real cluster enviroment. This all means that it would not be feasible to change the tests to make them work in both enviroments (with and without trash) because it would require complete rewrite of regexps so that it would match both cases (which is not even possible in some cases). Moreover it would make hard for us to easily accept changes from upstream Hadoop testcli cases as it would lose the quick and cheap reuse status. So my recommendation is to leave the assert here and either: Disable the trash feature on the server entirely. Considering the impact which the feature has, it may be reasonable solution (Hortonworks does this for HDP distro in default configuration). Or script the reconfiguration of the hadoop cluster to disable trash feature just for run of testcli.
          Hide
          mbukatov Martin Bukatovic added a comment -

          This fs trash issue is not specific to any Hadoop filesystem. I mean that enabled trash feature would break TestCLI testcases on any Hadoop compatible filesystem.

          Show
          mbukatov Martin Bukatovic added a comment - This fs trash issue is not specific to any Hadoop filesystem. I mean that enabled trash feature would break TestCLI testcases on any Hadoop compatible filesystem.
          Hide
          mbukatov Martin Bukatovic added a comment -

          Disabling trash feature would affect only hadoop fs command line tool and for example HDP distribution of Hortonworks does disable it by default.

          Show
          mbukatov Martin Bukatovic added a comment - Disabling trash feature would affect only hadoop fs command line tool and for example HDP distribution of Hortonworks does disable it by default.
          Hide
          mbukatov Martin Bukatovic added a comment - - edited

          Yes, org.apache.bigtop.itest.hadoop.hcfs.TestCLI executes tests from testHCFSConf.xml which are valid for any Hadoop filesystem - hence the name HCFS was chosen (hadoop compatible filesystem). On the other hand if the test is specific to HDFS, it should go to testHDFSConf.xml which is executed by org.apache.bigtop.itest.hadoop.hdfs.TestHDFSCLI.

          This also means that on Hadoop cluster with HDFS filesystem, you would run both TestCLI and TestHDFSCLI tests, while on Hadoop cluster with GlusterFS (or any other alternative filesystem), you only run TestCLI.

          Show
          mbukatov Martin Bukatovic added a comment - - edited Yes, org.apache.bigtop.itest.hadoop.hcfs.TestCLI executes tests from testHCFSConf.xml which are valid for any Hadoop filesystem - hence the name HCFS was chosen (hadoop compatible filesystem). On the other hand if the test is specific to HDFS, it should go to testHDFSConf.xml which is executed by org.apache.bigtop.itest.hadoop.hdfs.TestHDFSCLI . This also means that on Hadoop cluster with HDFS filesystem, you would run both TestCLI and TestHDFSCLI tests, while on Hadoop cluster with GlusterFS (or any other alternative filesystem), you only run TestCLI.
          Hide
          mbukatov Martin Bukatovic added a comment -

          Again, the requirement for disabled trash is not specific to any particular filesystem, but it's common to all Hadoop compatible filesystems. See my first comment in this JIRA for example where enabled trash makes tests executed on HDFS fail.

          Show
          mbukatov Martin Bukatovic added a comment - Again, the requirement for disabled trash is not specific to any particular filesystem, but it's common to all Hadoop compatible filesystems. See my first comment in this JIRA for example where enabled trash makes tests executed on HDFS fail.
          Hide
          cos Konstantin Boudnik added a comment -

          Is it still an issue? I think removing this assert will make a bunch of tests failing because test output would be messed up. Could you please confirm and either close or provide a patch for it? Thanks!

          Show
          cos Konstantin Boudnik added a comment - Is it still an issue? I think removing this assert will make a bunch of tests failing because test output would be messed up. Could you please confirm and either close or provide a patch for it? Thanks!
          Hide
          dasha.boudnik Dasha Boudnik added a comment -

          Yeah, let's close it – at least until we come up with a way to deal with this, if any. Either way, looks like we won't be removing the assert.

          Show
          dasha.boudnik Dasha Boudnik added a comment - Yeah, let's close it – at least until we come up with a way to deal with this, if any. Either way, looks like we won't be removing the assert.

            People

            • Assignee:
              dasha.boudnik Dasha Boudnik
              Reporter:
              dasha.boudnik Dasha Boudnik
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development