Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-6275

ListHDFS with Full Path filter mode regex does not work as intended

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.8.0, 1.9.0, 1.9.1, 1.9.2
    • Fix Version/s: 1.10.0
    • Labels:
      None

      Description

      When using the Full Path filter mode, the regex is applied to the URI returned for each file which includes the scheme and authority (hostname, HA namespace, port). For the filter to work across multiple HDFS installations (such as a flow used on multiple environments that is retrieved from NiFi Registry), the regex filter would have to account for the scheme and authority by matching possible scheme and authority values.

      To make it easier for the user, the Full Path filter mode's filter regex should only be applied to the path components of the URI, without the scheme and authority. This can be done by updating the filter for Full Path mode to use: Path.getPathWithoutSchemeAndAuthority(Path). This will bring the regex values in line with the other modes, since those are only applied to the value of Path.getName().

      Migration guidance will be needed when this improvement is released. Existing regex values for Full Path filter mode that accepted any scheme and authority will still work.
      Those that specify a scheme and authority will not work, and will have to be updated to specify only path components.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jtstorck Jeff Storck
                Reporter:
                jtstorck Jeff Storck
              • Votes:
                1 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h