Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-5685

Allow processors' relationships to be grouped together




      One of key tenants of NiFi is that a Processor knows whether or not it failed to do its specific job - but does not know whether a 'failure' occurred within the context of the flow itself. This is the reason that we often see a 'success' and a 'failure' relationship and let the user choose how to handle a 'failure', instead of using some more abstract mechanism such as a Dead-Letter Queue.

      Quite often, though, processors have many reasons that they could fail to perform their task. For example, if a PutFile processor fails, the FlowFile is routed to 'failure' and it may not be clear to the user without looking at logs/bulletins, etc. why it failed. Did the destination directory not exist? Was there already a file with that name? Out of disk space/general IO problem? There are times when a user wants to handle the failure differently.

      At present, we tend to do one of two things:

      1) Add a new relationship. We may now have a separate relationship for 'duplicate filename', one for 'directory.missing', one for 'io.failure', etc. Unfortunately, in this case adding new relationships can result in making existing flows invalid because not all relationships are connected. Additionally, when the user goes to create a connection / auto-terminate, they now have a lot of different relationships that they have to deal with, and this is a pain if they want to treat all failures the same way. This also often leads to relationships like 'Retry' that are poorly named because as described above, it is not really known by the developer at compilation time if the FlowFile should be retried - it depends on the context of the flow itself and the user's intent/desire.

      2) The second approach that is sometimes taken is to add an attribute like "failure.reason". This is problematic for a couple of reasons. First, users then must route 'failure' to a RouteOnAttribute processor to route the FlowFile based on all of the possible conditions. Secondly, this requires that all conditions be clearly documented and not change. Thirdly, this is error-prone because it's easy to make a typo or forget a particular value in your RouteOnAttribute.

      So, I propose allowing Relationships to be grouped together. From the Processor developer's point of view, it might look like the following:

      final Relationship SUCCESS = new Relationship.Builder()
        .explanation("Data successfully written to disk")
        .build(); // no grouping
      final Relationship DUPLICATE_FILENAME = new Relationship.Builder()
        .explanation("A file already exists with the same filename")
      final Relationship IO_FAILURE = new Relationship.builder()
        .explanation("Unable to store the data to disk do to an I/O failure, such as too many open files, out of storage space, etc.")

      In the UI, then, when a user is creating a Connection or updating one, or auto-terminating Relationships, they should be able to choose "success" or "failure" - or expand the "failure" somehow and choose individual relationships. The general "failure" relationship does not actually exist but instead is a grouping of all "failure" relationships.

      This provides the user an easy way to easily select the appropriate relationships still. It also gives the user much more control over how to route data when a failure occurs. Additionally, routing to 'duplicate.filename', for instance, means that the Provenance data will also have a lot more context so that users can later understand why the failure occurs. And it does this without the error-prone steps required by the second suggestion above. 




            • Assignee:
              markap14 Mark Payne
            • Votes:
              0 Vote for this issue
              1 Start watching this issue


              • Created: