Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-9814

NettyClientStreamRpc trying to establish connection during shutdown

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Later
    • None
    • None
    • None
    • None

    Description

      TestOzoneFileSystemWithStreaming logs lots of "connection refused" and "schedule reconnecting" messages during shutdown.

      sample
      2023-11-30 18:39:00,039 [main] INFO  ozone.MiniOzoneClusterImpl (MiniOzoneClusterImpl.java:shutdown(449)) - Shutting down the Mini Ozone Cluster
      
      ...
      
      2023-11-30 18:39:01,126 [Thread-1140] DEBUG client.NettyClientStreamRpc (NettyClientStreamRpc.java:close(452)) - flush? true
      2023-11-30 18:39:01,122 [Thread-1133] WARN  util.PeerProxyMap (PeerProxyMap.java:closeProxy(170)) - 2ad30381-d2ef-4801-b4d0-3965e2d4accc-NettyServerStreamRpc: Failed to close proxy for peer 3c309c61-fe70-460a-bb5b-539134755074|127.0.0.1:15014, proxy class: class org.apache.ratis.client.impl.DataStreamClientImpl
      org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) failed: Connection refused: /127.0.0.1:15015
      Caused by: java.net.ConnectException: finishConnect(..) failed: Connection refused
      	at org.apache.ratis.thirdparty.io.netty.channel.unix.Errors.newConnectException0(Errors.java:166)
      	at org.apache.ratis.thirdparty.io.netty.channel.unix.Errors.handleConnectErrno(Errors.java:131)
      	at org.apache.ratis.thirdparty.io.netty.channel.unix.Socket.finishConnect(Socket.java:359)
      	at org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:710)
      	at org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:687)
      	at org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:567)
      	at org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499)
      	at org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:407)
      	at org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
      	at org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
      	at java.lang.Thread.run(Thread.java:750)
      

      This is during a normal, successful test run (logs: TestOzoneFileSystemWithStreaming-output-pass.txt).

      On a related note, test run occasionally times out during shutdown:

      [INFO] Running org.apache.hadoop.fs.ozone.TestOzoneFileSystemWithStreaming
      ...
      [WARNING] Tests run: 1030, Failures: 0, Errors: 0, Skipped: 24
      ...
      [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M5:test (default-test) on project ozone-integration-test: There was a timeout in the fork -> [Help 1]
      

      (logs: org.apache.hadoop.fs.ozone.TestOzoneFileSystemWithStreaming-output.txt, 2023-05-05T05-46-34_028-jvmRun1.dump)

      More recently I noticed job runner timeout. Unfortunately we don't have logs for that:

      Thu, 30 Nov 2023 13:38:47 GMT
      [INFO] Running org.apache.hadoop.fs.ozone.TestOzoneFileSystemWithStreaming
      Thu, 30 Nov 2023 15:58:17 GMT
      Error: The operation was canceled.
      

      CC szetszwo

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              adoroszlai Attila Doroszlai
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: