The quick answer is that HDFS trash feature makes large number of TestCLI
testcases fail. So I introduced this assert to make debugging easier: instead
of inspecting seemingly random fails, the you can find the root cause of the
issue right in the beginning of a log file.
Why trash feature makes the testcases fail is a different story. TestCLI
testcases are defined in large xml file which comes (with minor modifications)
from vanilla Hadoop unit tests.
For example here is the definition of one case:
<description>rm: removing a file (relative path) </description>
<command>-fs NAMENODE -mkdir -p dir</command> <!-- make sure user home dir exists -->
<command>-fs NAMENODE -touchz file0</command>
<command>-fs NAMENODE -rm -r file0</command>
<command>-fs NAMENODE -rm -r /user/USER_NAME/*</command>
It consists of list of hadoop fs commands and list of comparators. There is
single regular expression in this case which is expected to match the output of
hadoop fs commands.
The problem is that when HDFS trash feature is enabled, the output is changed
and the regular expression would not match any more:
Test Description: [rm: removing a file (relative path) ]
Test Commands: [-fs hdfs://dhcp-lab-203.englab.brq.redhat.com:8020 -mkdir -p dir]
Test Commands: [-fs hdfs://dhcp-lab-203.englab.brq.redhat.com:8020 -touchz file0]
Test Commands: [-fs hdfs://dhcp-lab-203.englab.brq.redhat.com:8020 -rm file0]
Cleanup Commands: [-fs hdfs://dhcp-lab-203.englab.brq.redhat.com:8020 -rm /user/bigtop/*]
Comparision result: [fail]
Expected output: [^Deleted file0]
Actual output: [Moved: 'hdfs://dhcp-lab-203.englab.brq.redhat.com:8020/user/bigtop/file0' to trash at: hdfs://dhcp-lab-203.englab.brq.redhat.com:8020/user/bigtop/.Trash/Current
Note here that instead of expected output Deleted file0, the hadoop fs
command moves the file into trash and changes the output accordingly.
Bear in mind that this is just single example, there are more ways how HDFS
trash breaks other tests (eg. file is expected to be deleted and the total
size of home directory for given user is not zero ...). Moreover the number of
testcases affected is rather large (at least this is as I remeber it to be,
the logs of testruns which would help me to present real number here got
rotated away into oblivion ...).
Ok, and why is vanilla Hadoop unittests written in this way? Because it's easy
to disable trash feature when you do unit testing of a hadoop library. But when
you have a real cluster, it is not possible to disable trash feature on the
client, this needs to be done on the HDFS server. Also note that this trash
feature affects only hadoop fs commands (eg. hadoop fs -rm /user/foo/bar)
and doesn't have any effect on any other access to the HDFS (eg. via HDFS API
or during mapred job).
Also I always considered that the main idea behind TestCLI tests is to easily
and cheaply reuse unittests from vanilla Hadoop in real cluster enviroment.
This all means that it would not be feasible to change the tests to make
them work in both enviroments (with and without trash) because it would require
complete rewrite of regexps so that it would match both cases (which is not
even possible in some cases). Moreover it would make hard for us to easily accept
changes from upstream Hadoop testcli cases as it would lose the quick and cheap
So my recommendation is to leave the assert here and either:
- Disable the trash feature on the server entirely. Considering the impact
which the feature has, it may be reasonable solution (Hortonworks does this
for HDP distro in default configuration).
- Or script the reconfiguration of the hadoop cluster to disable trash feature
just for run of testcli.