Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.0.0, 1.1.0, 1.2.0, 1.3.0
-
None
Description
Thread from the mailing list...
We are using NiFi PutTCP processors to send messages to a number of Moxa
onCell ip gateway devices.
These Moxa devices are running on a cellular network with not always the
most ideal connection. The Moxa only allows for a maximum of 2 simultaneous
client connections.
What we notice is that although we specify connection / read timeouts on
both PutTCP and the Moxa, that sometimes a connection get "stuck". (In the
moxa network monitoring we see 2 client sockets coming from PutTCP in the
ESTABLISHED state that never go away).
This doesn't always happen, but often enough for it to be considered a
problem, as it requires a restart of the moxa ports to clear the connections
(manual step). It typically happens when PutTCP experiences a Timeout.
On the PutTCP processors we have the following settings :
- Idle Connection Expiration : 30 seconds (we've set this higher due to bad
gprs connection) - Timeout : 10 seconds (this is only used as a timeout for establishing the
connection)
On the Moxas we have
- TCP alive check time : 2min (this should force the Moxa to close the
socket)
Yet for some reason the connection remains established.
Here's what I found out :
On the moxa I noticed a connection (with client port 48440) that is in
ESTABLISHED mode for 4+ hours. (blocking other connections). On the Moxa I
can see when the connection was established :
2017/09/17 14:20:29 [OpMode] Port01 Connect 10.192.2.90:48440
I can track that down in Nifi via the logs (unfortunately PutTCP doesn't log
client ports, but from the timestamp I'm sure it's this connection :
2017-09-17 14:20:10,837 DEBUG [Timer-Driven Process Thread-10] o.apache.nifi.processors.standard.PutTCP PutTCP[id=80231a39-1008-1159-a6fa-1f9e3751d608] No available connections, creating a new one... 2017-09-17 14:20:20,860 ERROR [Timer-Driven Process Thread-10] o.apache.nifi.processors.standard.PutTCP PutTCP[id=80231a39-1008-1159-a6fa-1f9e3751d608] No available connections, and unable to create a new one, transferring StandardFlowFileRecord[uuid=79f2a166-5211-4d2d-9275-03f0ce4d5b29,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1505641210025-1, container=default, section=1], offset=84519, length=9],offset=0,name=23934743676390659,size=9] to failure: java.net.SocketTimeoutException: Timed out connecting to 10.32.133.40:4001 2017-09-17 14:20:20,860 ERROR [Timer-Driven Process Thread-10] o.apache.nifi.processors.standard.PutTCP java.net.SocketTimeoutException: Timed out connecting to 10.32.133.40:4001 at org.apache.nifi.processor.util.put.sender.SocketChannelSender.open(SocketChannelSender.java:66) ~[nifi-processor-utils-1.1.0.jar:1.1.0] at org.apache.nifi.processor.util.put.AbstractPutEventProcessor.createSender(AbstractPutEventProcessor.java:312) ~[nifi-processor-utils-1.1.0.jar:1.1.0] at org.apache.nifi.processors.standard.PutTCP.createSender(PutTCP.java:121) [nifi-standard-processors-1.1.0.jar:1.1.0] at org.apache.nifi.processor.util.put.AbstractPutEventProcessor.acquireSender(AbstractPutEventProcessor.java:334) ~[nifi-processor-utils-1.1.0.jar:1.1.0] at org.apache.nifi.processors.standard.PutTCP.onTrigger(PutTCP.java:176) [nifi-standard-processors-1.1.0.jar:1.1.0] at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1099) [nifi-framework-core-1.1.0.jar:1.1.0] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136) [nifi-framework-core-1.1.0.jar:1.1.0] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) [nifi-framework-core-1.1.0.jar:1.1.0] at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132) [nifi-framework-core-1.1.0.jar:1.1.0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_111] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_111] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_111] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_111] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_111] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_111] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
On an OS level I indeed see the TCP connection originated from Nifi :
netstat -tn | grep 48440
tcp 711 0 10.192.2.90:48440 10.32.133.40:4001
ESTABLISHED
lsof -i TCP:48440
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 3424 root 1864u IPv4 404675057 0t0 TCP
NifiServer:48440->10.32.133.40:newoak (ESTABLISHED)
ps -ef | grep 3424
root 3424 3390 8 11:32 ? 00:44:33 java -classpath
/opt/nifi-1.1.0/./conf.....
Any ideas on how to debug this further ?
I looked at the code in SocketChannelSender.java:66 and was wondering it no
additional cleanup is necessary when throwing the SocketTimeoutException. If
the connection ends up getting established after the timeout, doesn't PutTCP
need to clean that up ?
Attachments
Issue Links
- links to