Bigtop
  1. Bigtop
  2. BIGTOP-740

improve the package file content tests to ignore platform specific file names

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.5.0
    • Fix Version/s: None
    • Component/s: tests
    • Labels:
      None

      Description

      right now the package file content test recognize below files are different and report errors

      /usr/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64 (in centos5_x64)
      /usr/lib/hadoop-0.20-mapreduce/lib/native/Linux-i386-32 (in centos6_x32)
      ......
      /usr/lib64/libhdfs.so (in centos5_x64)
      /usr/lib/libhdfs.so (in centos6_x32)
      

      the test should ignore those kind of difference, as they are platform specific name.

      there is another type of file name difference we should ignore is the dependency version. For example, the hue-common package detect the python version installed on the OS and install correspond version of python related file, for example

      /usr/share/hue/build/env/bin/pip-2.4  (in centos5)
      /usr/share/hue/build/env/bin/pip-2.6  (in centos6)
      

      those diff also should be ignored.

      1. BIGTOP_740.txt
        2 kB
        Johnny Zhang

        Activity

        Hide
        Johnny Zhang added a comment - - edited

        the solution I have right now is hard code a list of substring we want to trim from the file names, like "-amd64-64", "-i386-32".

        But it might not be a good idea when it comes to dependency version case (like python 2.4/2.6). As we don't want to hard code and maintain such a list in the code.

        Show
        Johnny Zhang added a comment - - edited the solution I have right now is hard code a list of substring we want to trim from the file names, like "-amd64-64", "-i386-32". But it might not be a good idea when it comes to dependency version case (like python 2.4/2.6). As we don't want to hard code and maintain such a list in the code.
        Hide
        Roman Shaposhnik added a comment -

        One option, perhaps, is to explore an idea of introducing pattern matching into our content comparison code. IOW, we can have things like:

            <file name='/usr/lib/hadoop-0.20-mapreduce/lib/native/Linux-*-*' owners='1' perm='drwxr-xr-x' user='root' group='root' />
        

        The only caveat here would be that debug output: https://github.com/apache/bigtop/blob/trunk/bigtop-tests/test-artifacts/package/src/main/groovy/org/apache/bigtop/itest/packagesmoke/PackageTestCommon.groovy#L416
        should be smart enough to recognize the pattern'ed strings and output them instead.

        Show
        Roman Shaposhnik added a comment - One option, perhaps, is to explore an idea of introducing pattern matching into our content comparison code. IOW, we can have things like: <file name='/usr/lib/hadoop-0.20-mapreduce/lib/native/Linux-*-*' owners='1' perm='drwxr-xr-x' user='root' group='root' /> The only caveat here would be that debug output: https://github.com/apache/bigtop/blob/trunk/bigtop-tests/test-artifacts/package/src/main/groovy/org/apache/bigtop/itest/packagesmoke/PackageTestCommon.groovy#L416 should be smart enough to recognize the pattern'ed strings and output them instead.
        Hide
        Johnny Zhang added a comment - - edited

        a little explanations for the patch:
        I make the change so the code will change file names in both golden meta and runtime meta to the same uniform string. For example:
        in golden meta, file name is "/usr/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64"
        in runtime meta, file name is "/usr/lib/hadoop-0.20-mapreduce/lib/native/Linux-i386-32"
        And the code change both of them into uniform string "/usr/lib/hadoop-0.20-mapreduce/lib/native/Linux-BIT-VERSION" so they will match during comparison.

        @Roman, This change will also let code generate uniformed string in the debug file in target dir. For example, the content of 'hadoop-0.20-mapreduce.xml' generated in the target dir will looks like

        ......
        <file name='/usr/lib/hadoop-0.20-mapreduce/lib/native/Linux-BIT-VERSION' owners='-1' perm='lrwxrwxrwx' user='root' group='root' target='/usr/lib/hadoop/lib/native' />
        ......
        

        the reason we cannot convert all "i386" string to "amd64" or convert all "/usr/lib/libhdfs" to "/usr/lib64/libhdfs" string is because it will change other pkg's right files names into incorrect ones (I found this will be the case if change all "/usr/lib/libhdfs" to "/usr/lib64/libhdfs" in libhdfs pkg will make libhfds file content test pass but will fail libhdfs0 pkg file content test). So, change both into a uniform string like "BIT-VERSION" is a better solution.

        Show
        Johnny Zhang added a comment - - edited a little explanations for the patch: I make the change so the code will change file names in both golden meta and runtime meta to the same uniform string. For example: in golden meta, file name is "/usr/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64" in runtime meta, file name is "/usr/lib/hadoop-0.20-mapreduce/lib/native/Linux-i386-32" And the code change both of them into uniform string "/usr/lib/hadoop-0.20-mapreduce/lib/native/Linux-BIT-VERSION" so they will match during comparison. @Roman, This change will also let code generate uniformed string in the debug file in target dir. For example, the content of 'hadoop-0.20-mapreduce.xml' generated in the target dir will looks like ...... <file name='/usr/lib/hadoop-0.20-mapreduce/lib/native/Linux-BIT-VERSION' owners='-1' perm='lrwxrwxrwx' user='root' group='root' target='/usr/lib/hadoop/lib/native' /> ...... the reason we cannot convert all "i386" string to "amd64" or convert all "/usr/lib/libhdfs" to "/usr/lib64/libhdfs" string is because it will change other pkg's right files names into incorrect ones (I found this will be the case if change all "/usr/lib/libhdfs" to "/usr/lib64/libhdfs" in libhdfs pkg will make libhfds file content test pass but will fail libhdfs0 pkg file content test). So, change both into a uniform string like "BIT-VERSION" is a better solution.
        Hide
        Johnny Zhang added a comment -

        the only problematic one is hue-common package in RPM family. if we use the golden meta generated by centos5 system, we will find lots of different files installed by hue-common pkg in centos6 or redhat6 systems.

        For example:
        pyton2.6 vs python 2.4
        python vs pip
        py2.6.egg vs py2.4.egg
        -egg vs .egg
        easy_install-2.6 vs easy_install-2.4
        ......

        as described above the reason of those difference is because in centos6/redhat6, hue-common install depends on python 2.6, while the centos5 depends on python 2.4. Plus those python files difference are not specific to our pkg, but to python's. So we should ignore hue-common package file contents difference. Otherwise, we will have a longer (20+) list of patterns hard coded.

        Show
        Johnny Zhang added a comment - the only problematic one is hue-common package in RPM family. if we use the golden meta generated by centos5 system, we will find lots of different files installed by hue-common pkg in centos6 or redhat6 systems. For example: pyton2.6 vs python 2.4 python vs pip py2.6.egg vs py2.4.egg -egg vs .egg easy_install-2.6 vs easy_install-2.4 ...... as described above the reason of those difference is because in centos6/redhat6, hue-common install depends on python 2.6, while the centos5 depends on python 2.4. Plus those python files difference are not specific to our pkg, but to python's. So we should ignore hue-common package file contents difference. Otherwise, we will have a longer (20+) list of patterns hard coded.
        Hide
        Roman Shaposhnik added a comment -

        Johnny, I'm not sure hard-coding pattern matching (like you have for very particular patterns) is a good idea. If you want this level of flexibility I think it needs to be made part of some kind of external configuration.

        Show
        Roman Shaposhnik added a comment - Johnny, I'm not sure hard-coding pattern matching (like you have for very particular patterns) is a good idea. If you want this level of flexibility I think it needs to be made part of some kind of external configuration.
        Hide
        Johnny Zhang added a comment -

        @Roman, thanks for comments. I agree hard-coding patterns into groovy is not a good option.

        How about we check in a file with file content diff we want to ignore in a external location, let's call it to-be-ignored.diff And during the file content test, all runtime diff included in this diff got ignored. In this way, we don't have to hard-coding any patterns in code, and it is also easy to maintain it (just make diff, and checkin it). The plus benefit of this method is it also make it easy to take care file content diff introduced by pkg like hue-common (the python thing).

        Show
        Johnny Zhang added a comment - @Roman, thanks for comments. I agree hard-coding patterns into groovy is not a good option. How about we check in a file with file content diff we want to ignore in a external location, let's call it to-be-ignored.diff And during the file content test, all runtime diff included in this diff got ignored. In this way, we don't have to hard-coding any patterns in code, and it is also easy to maintain it (just make diff, and checkin it). The plus benefit of this method is it also make it easy to take care file content diff introduced by pkg like hue-common (the python thing).
        Hide
        Johnny Zhang added a comment -

        this patch also need one more fix. The generated debug file should really print the original file name, NOT something like

        ......
        <file name='/usr/lib/hadoop-0.20-mapreduce/lib/native/Linux-BIT-VERSION' owners='-1' perm='lrwxrwxrwx' user='root' group='root' target='/usr/lib/hadoop/lib/native' />
        ......
        

        working on a new patch with combination of above two things.

        Show
        Johnny Zhang added a comment - this patch also need one more fix. The generated debug file should really print the original file name, NOT something like ...... <file name='/usr/lib/hadoop-0.20-mapreduce/lib/native/Linux-BIT-VERSION' owners='-1' perm='lrwxrwxrwx' user='root' group='root' target='/usr/lib/hadoop/lib/native' /> ...... working on a new patch with combination of above two things.
        Hide
        Roman Shaposhnik added a comment -

        Johnny, I like the idea of having a 'filter' file. Lets just put it in resources with the rest of them.

        Show
        Roman Shaposhnik added a comment - Johnny, I like the idea of having a 'filter' file. Lets just put it in resources with the rest of them.

          People

          • Assignee:
            Johnny Zhang
            Reporter:
            Johnny Zhang
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development