Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-4391

PutTCP not properly closing connections

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.0, 1.1.0, 1.2.0, 1.3.0
    • 1.4.0
    • Extensions
    • None

    Description

      Thread from the mailing list...

      We are using NiFi PutTCP processors to send messages to a number of Moxa
      onCell ip gateway devices.

      These Moxa devices are running on a cellular network with not always the
      most ideal connection. The Moxa only allows for a maximum of 2 simultaneous
      client connections.

      What we notice is that although we specify connection / read timeouts on
      both PutTCP and the Moxa, that sometimes a connection get "stuck". (In the
      moxa network monitoring we see 2 client sockets coming from PutTCP in the
      ESTABLISHED state that never go away).

      This doesn't always happen, but often enough for it to be considered a
      problem, as it requires a restart of the moxa ports to clear the connections
      (manual step). It typically happens when PutTCP experiences a Timeout.

      On the PutTCP processors we have the following settings :

      • Idle Connection Expiration : 30 seconds (we've set this higher due to bad
        gprs connection)
      • Timeout : 10 seconds (this is only used as a timeout for establishing the
        connection)

      On the Moxas we have

      • TCP alive check time : 2min (this should force the Moxa to close the
        socket)

      Yet for some reason the connection remains established.

      Here's what I found out :

      On the moxa I noticed a connection (with client port 48440) that is in
      ESTABLISHED mode for 4+ hours. (blocking other connections). On the Moxa I
      can see when the connection was established :

      2017/09/17 14:20:29 [OpMode] Port01 Connect 10.192.2.90:48440

      I can track that down in Nifi via the logs (unfortunately PutTCP doesn't log
      client ports, but from the timestamp I'm sure it's this connection :

      2017-09-17 14:20:10,837 DEBUG [Timer-Driven Process Thread-10]
      o.apache.nifi.processors.standard.PutTCP
      PutTCP[id=80231a39-1008-1159-a6fa-1f9e3751d608] No available connections,
      creating a new one...
      2017-09-17 14:20:20,860 ERROR [Timer-Driven Process Thread-10]
      o.apache.nifi.processors.standard.PutTCP
      PutTCP[id=80231a39-1008-1159-a6fa-1f9e3751d608] No available connections,
      and unable to create a new one, transferring
      StandardFlowFileRecord[uuid=79f2a166-5211-4d2d-9275-03f0ce4d5b29,claim=StandardContentClaim
      [resourceClaim=StandardResourceClaim[id=1505641210025-1, container=default,
      section=1], offset=84519, length=9],offset=0,name=23934743676390659,size=9]
      to failure: java.net.SocketTimeoutException: Timed out connecting to
      10.32.133.40:4001
      2017-09-17 14:20:20,860 ERROR [Timer-Driven Process Thread-10]
      o.apache.nifi.processors.standard.PutTCP
      java.net.SocketTimeoutException: Timed out connecting to 10.32.133.40:4001
              at
      org.apache.nifi.processor.util.put.sender.SocketChannelSender.open(SocketChannelSender.java:66)
      ~[nifi-processor-utils-1.1.0.jar:1.1.0]
              at
      org.apache.nifi.processor.util.put.AbstractPutEventProcessor.createSender(AbstractPutEventProcessor.java:312)
      ~[nifi-processor-utils-1.1.0.jar:1.1.0]
              at
      org.apache.nifi.processors.standard.PutTCP.createSender(PutTCP.java:121)
      [nifi-standard-processors-1.1.0.jar:1.1.0]
              at
      org.apache.nifi.processor.util.put.AbstractPutEventProcessor.acquireSender(AbstractPutEventProcessor.java:334)
      ~[nifi-processor-utils-1.1.0.jar:1.1.0]
              at
      org.apache.nifi.processors.standard.PutTCP.onTrigger(PutTCP.java:176)
      [nifi-standard-processors-1.1.0.jar:1.1.0]
              at
      org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1099)
      [nifi-framework-core-1.1.0.jar:1.1.0]
              at
      org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136)
      [nifi-framework-core-1.1.0.jar:1.1.0]
              at
      org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
      [nifi-framework-core-1.1.0.jar:1.1.0]
              at
      org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132)
      [nifi-framework-core-1.1.0.jar:1.1.0]
              at
      java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      [na:1.8.0_111]
              at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
      [na:1.8.0_111]
              at
      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
      [na:1.8.0_111]
              at
      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
      [na:1.8.0_111]
              at
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      [na:1.8.0_111]
              at
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      [na:1.8.0_111]
              at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
      

      On an OS level I indeed see the TCP connection originated from Nifi :

      netstat -tn | grep 48440

      tcp 711 0 10.192.2.90:48440 10.32.133.40:4001
      ESTABLISHED

      lsof -i TCP:48440

      COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
      java 3424 root 1864u IPv4 404675057 0t0 TCP
      NifiServer:48440->10.32.133.40:newoak (ESTABLISHED)

      ps -ef | grep 3424

      root 3424 3390 8 11:32 ? 00:44:33 java -classpath
      /opt/nifi-1.1.0/./conf.....

      Any ideas on how to debug this further ?

      I looked at the code in SocketChannelSender.java:66 and was wondering it no
      additional cleanup is necessary when throwing the SocketTimeoutException. If
      the connection ends up getting established after the timeout, doesn't PutTCP
      need to clean that up ?

      Attachments

        Issue Links

          Activity

            People

              bbende Bryan Bende
              bbende Bryan Bende
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: