Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.13.2
-
None
-
None
Description
The ListenRELP processor does sometimes not recover from errors (e.g. RELPFrameException). A manual stop and start of the processor is then necessary to re-establish communication with the client (rsyslog). In particular, if such an errors occurs more than once.
How to reproduce:
- Enable DEBUG logging for org.apache.nifi.processors.standard.ListenRELP
- Create a simple flow with a ListenRELP processor, set a valid port (e.g. 12345). Leave default for all other values, esp. Max Number of TCP Connections = 2.
- Connect ListenRELPs output to a funnel and start it.
- Install the tool nc (netcat).
- Use nc to provide some correct and also some invalid data as follows:
Start RELP session on command line:
$ nc 127.0.0.1 12345
Enter the following to open the connection:
1 open 0
Expect the following response:
1 rsp 7 200 OK
Enter the following to submit a valid line:
2 syslog 3 abc
Expect the following response:
2 rsp 6 200 OK
Now enter an invalid line:
3 syslog -1
Expect RELPFrameException in the logs and no response in the nc session.
Nifi will not respond via this connection anymore, even for valid lines. Which is ok
according to the RELP spec.
Press Ctrl-C to end the nc session.
Open a new nc session and repeat the same commands.
It should work for a second time, as we may have two TCP connections.
However, it will not work a third or fourth time: At some point in time ListenRELP will not respond at all, even within a complete new connection. The only way to recover from this state seems to be: Stop and Start of the processor.
Also: At some point in time (after all connections have been used up?) the following DEBUG message is printed very often (several times per ms!):
o.a.nifi.processors.standard.ListenRELP ListenRELP[id=<uuid>] No more data available, returning for selection
This behaviour is a problem for our production setup: Even though it does not happen very often, it does happen. And data might be lost, if this state is not detected and resolved fast enough.
Disclaimer: Sending an invalid RELP frame is not what happens in our production environment. It's just a simple way to get ListenRELP into this state.
We are not sure about the core reason for the communcation interuption, perhaps a network/firewall issue. But the result looks very much like described here.