[IGNITE-7134] Never-ending timeout in IgniteSpiOperationTimeoutHelper.nextTimeoutChunk() - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Won't Fix
Affects Version/s: 2.3
Fix Version/s: None
Component/s: general
Labels:
None

Description

org.apache.ignite.spi.IgniteSpiOperationTimeoutHelper#nextTimeoutChunk

long curTs = U.currentTimeMillis();

timeout = timeout - (curTs - lastOperStartTs);

Timeout will not be decreased at all if delay between successive calls to nextTimeoutChunk() is smaller than U.currentTimeMillis() discretization. Such behaviour could be easily achieved when getting an error right after the nextTimeoutChunk() invocation and do the retry.

Only rare calls (the first right before U.currentTimeMillis() and the second right after that) may decrease timeout, so actual IgniteSpiOperationTimeoutHelper timeout could be much bigger than the failureDetectionTimeout.

My opinion to not split failureDetectionTimeout between network operations, but initialize first operation timestamp at first call to nextTimeoutChunk(), and then calculate the timeout as a difference between the current timestamp and the first operation timestamp.

Attachments

Issue Links

is broken by

IGNITE-7152 Failure detection timeout don't work on permanent send message errors causing infinite loop

Resolved

links to

GitHub Pull Request #3166

Activity

People

Assignee:: Alexandr Kuramshin

Reporter:: Alexandr Kuramshin

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 07/Dec/17 08:19

Updated:: 02/Aug/19 11:13

Resolved:: 11/Dec/17 11:34

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

20m