[RATIS-592] One node ratis writes fail forever after first NotLeaderException or LeaderNotReadyException - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 0.3.0
Fix Version/s: 0.4.0
Component/s: gRPC
Labels:
None

Description

~~RATIS-571~~, modified the GrpcClientProtocolClient to not set the AsyncStreamObserver reference to null on an exception, however, the ReplyMap reference is set to null. This results in the client getting an AlredyClosedException on the stream on a retry for a NotLeader or a LeadrNotReady exception and never recovers. This is common in a unit test scenario where a request is sent immediately after the cluster is up.

There is nothing special here about one node Ratis however, the HDDS unit tests that fail are all one node Ratis and the most probable cause is that with client retrying a different node each time, increases the chance of success on a three-node ring.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

RATIS-592.01.patch
17/Jun/19 18:32
2 kB
Siddharth Wagle
RATIS-592.02.patch
20/Jun/19 19:47
3 kB
Siddharth Wagle
RATIS-592.03.patch
20/Jun/19 21:57
4 kB
Siddharth Wagle
RATIS-592.04.patch
24/Jun/19 06:04
13 kB
Siddharth Wagle
RATIS-592.05.patch
24/Jun/19 15:24
15 kB
Siddharth Wagle
RATIS-592.06.patch
24/Jun/19 22:09
15 kB
Siddharth Wagle
RATIS-592.07.patch
25/Jun/19 04:00
15 kB
Siddharth Wagle
RATIS-592.08.patch
25/Jun/19 05:25
12 kB
Siddharth Wagle
RATIS-592.09.patch
27/Jun/19 18:09
12 kB
Siddharth Wagle

Issue Links

blocks

HDDS-1555 Disable install snapshot for ContainerStateMachine

Resolved

Activity

People

Assignee:: Siddharth Wagle

Reporter:: Siddharth Wagle

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 17/Jun/19 18:26

Updated:: 28/Jun/19 08:57

Resolved:: 28/Jun/19 08:37