[IGNITE-20081] Implement "weakSend" properly, add "weakInvoke" - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
- ignite-3
- ignite3_performance

Description

There was an idea. Some components, like RAFT, are allowed to lose messages. Having strict guarantees for messages delivery may not be good for such components.

But, current implementation of "weakSend" is just a wrapper around "send" that doesn't return any future. This API must be redesigned and properly implemented.

API

CompletableFuture<Void> weakSend(ClusterNode recipient, NetworkMessage msg, long timeout);
CompletableFuture<NetworkMessage> weakInvoke(ClusterNode recipient, NetworkMessage msg, long timeout);

Futures are being completed in two cases:

ack or response has been received
timeout is exceeded

This means that huge timeout is probably a bad idea for such messages.

Implementation

with stable and fast connection, weak communication should work the same way from the client standpoint;
if a message queue for the given connection is full, we may/should:
- remove all weak messages from the existing queue, that 100% have not been sent;
- reject new weak messages;
- maybe throttle, but this is out of scope;
alternatively, if connection breaks, we may start removing weak messages from the queue, and/or rejecting new ones.

Weak send and weak invoke may behave differently.

For example, "weakSend" requires ack, so it has to be marked with a "message number" in recovery descriptor.
But, "weakInvoke" doesn't need an ack, it only requires a response (already has "correlationId"), so "not re-sending" it after reconnect shouldn't break the recovery protocol. It doesn't need to have a "message number" in a recovery descriptor, we can save some resources by reducing the number of acks.

One more important thing:

when invoke future fails with timeout exception, we must cleanup corresponding correlation ID from the map;
when we receive "node left" event for some node, we should complete all returned futures with some "NodeLeftException", and cleanup all its correlation IDs from the map as well.

Integration

will be done separately. All we need, for now, is a set of unit tests.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Ivan Bessonov

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 28/Jul/23 09:29

Updated:: 22/Aug/23 12:50