[IGNITE-10469] TcpCommunicationSpi does not break tcp connection after IdleConnectionTimeout seconds of inactivity - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Cannot Reproduce
Affects Version/s: 2.5, 2.6
Fix Version/s: 2.7
Component/s: cache
Labels:
None

Description

TcpCommunicationSpi does not close TCP connections after they have been idle for more than configured in TcpCommunicationSpi#idleConnTimeout amount of time (default is 10 minutes).

There are environments where idle TCP connections become unusable: connections remain ESTABLISHED while actual data to be sent piles up in Send-Q (according to netstat). For this reason Ignite stack does not recognize a communication problem for a considerable amount of time (~ 10-15 minutes), and it does not begin its reconnection procedure (hearbeats use different tcp connections that are not idle and don't have this issue).

I've discovered though there is a logic in the Ignite code to detect and close idle connections. But due to a problem in the code it does not work reliably.

This is a test that sometimes reproduces the problem.
ignite_idle_test.zip - full test project
GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java - just test code
2.6.0.txt - mvn clean install logs for test with Ignite 2.6.0

What's the problem in the Ignite code?

There are two loops in the Ignite code that have a chance to close idle connections:
1) org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.CommunicationWorker#processIdle - this one is executed each IdleConnectionTimeout milliseconds. (it can close idle connections but it typically turns out that it thinks that connection is not idle, thanks to the second loop).
2) org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#bodyInternal -> org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#checkIdle - this loop executes:

filterChain.onSessionIdleTimeout(ses); <-- does not actually close an idle connection
// Update timestamp to avoid multiple notifications within one timeout interval.
ses.resetSendScheduleTime(); <--- resets idle timer
ses.bytesReceived(0);

—
To wind up, may be the whole approach should be reviewed:

is it ok not to track message delivery time?
is it ok not to do heartbeating using the same connections as for get/put/... commands?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ignite_idle_test.zip
29/Nov/18 11:55
5 kB
Igor Kamyshnikov
GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java
29/Nov/18 11:56
7 kB
Igor Kamyshnikov
GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java
08/Dec/18 09:47
7 kB
Evgeny Stanilovsky
2.6.0.txt
29/Nov/18 12:24
93 kB
Igor Kamyshnikov

Activity

People

Assignee:: Evgeny Stanilovsky

Reporter:: Igor Kamyshnikov

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 29/Nov/18 12:22

Updated:: 11/Dec/18 07:17

Resolved:: 11/Dec/18 07:17