Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-6286

Make listHDFS work as INPUT_ALLOWED processor -> create new ScanHDFS processor

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.9.2
    • Fix Version/s: None
    • Component/s: Core Framework
    • Labels:

      Description

      Currently the listHDFS processor has a prop 'Directory' (to start the listing from, recursively or not) which only allows 1 static value.

      There are many use cases where you would want to crawl many roots in sequence. There are 2 ways to do it.

      1. Allow the 'Directory' prop to have multiple comma separated values
      2. Refactor listHDFS as an INPUT_ALLOWED processor and make the 'Directory' prop take EL to get directory roots from upstream

      Option 1. has serious restrictions since it dictates that other config (like recursive, filter type and regex) would still be static and may get very complex, non-intuitive and require frequent re-configuration.

      Option 2. is the way to go.

      Some things to consider:

      -The way listHDFS behaves now should be preserved

      -It makes sense to dynamically set 'Directory', 'Recursiveness', 'Regex' and 'Filter type' in tandem      to be able to detail the way each root directory is crawled

      -Switching 'Directory' also requires that not just 1 state is stored but states for each directory that ever passed

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jasperknulst Jasper Knulst
                Reporter:
                jasperknulst Jasper Knulst
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m