Uploaded image for project: 'Apache Rat'
  1. Apache Rat
  2. RAT-323 Harmonize UIs
  3. RAT-265

CLI: Certain wildcard file filters do not work anymore

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.13, 0.14
    • 0.17
    • Client - cli
    • None

    Description

      Run the following command in the root of the `rat` repo:

      java -jar apache-rat-0.14-20191120.132901-66.jar -e "*.txt" -d apache-rat-core/src/test/resources/violations

      This will give the following output on `stderr`: 

      Will skip given exclusion '*.txt' due to java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0
      *.txt
      ^
      

      Furthermore, `bad.txt` will NOT be excluded from the license check.

      The error that causes this is thrown in line 132 of `org.apache.rat.Report.java`]. The reason is simple: any glob pattern that starts with `*` or `?` is not a valid regex. When Line 132 throws, the next two lines will also be skipped, so the pattern will not be added at all.

      Unfortunately, a solution to this problem is not so simple. In `v0.12` the `-e` option always added wildcard filters while `-E` always added regex filters. The documentation still states the same in the latest `v0.14` snapshot. Beginning with `v0.13` the code tries to add any exclude rule as three different filters. I believe this approach is inherently flawed.

      Firstly, the `new NameFileFilter(exclusion)` is redundant if we also add `new WildcardFileFilter(exclusion)`. The files matched by the `NameFileFilter` are a subset of those matched by the `WildcardFileFilter` since any magic character (i.e. `?` or `*`) in `exclusion` also matches itself when used in a `WildcardFileFilter`.

      So let's assume we only register the `WildcardFileFilter` and the `RegexFileFilter`. Even if we properly add patterns as wildcard filters that are not a valid RegEx, there are still patterns where we cannot decide what the user's intention was. Consider the pattern `bi.ini`. Should it be interpreted as a wildcard pattern and match only itself or should it be interpreted as a regex and also match `bikini` for example?

      My recommendation for a quick patch solution would be to go back to the exclusion behavior of `v0.12`.

      Beyond that, the nicest solution IMHO would be support for ignore files with the same semantics as `.gitignore` (via `-E`) and support for giving extended shell globs via `-e`.

      Attachments

        Activity

          People

            claude Claude Warren
            raphinesse Raphael von der Grün
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: