Uploaded image for project: 'Ratis'
  1. Ratis
  2. RATIS-592

One node ratis writes fail forever after first NotLeaderException or LeaderNotReadyException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 0.3.0
    • 0.4.0
    • gRPC
    • None

    Description

      RATIS-571, modified the GrpcClientProtocolClient to not set the AsyncStreamObserver reference to null on an exception, however, the ReplyMap reference is set to null. This results in the client getting an AlredyClosedException on the stream on a retry for a NotLeader or a LeadrNotReady exception and never recovers. This is common in a unit test scenario where a request is sent immediately after the cluster is up.

      There is nothing special here about one node Ratis however, the HDDS unit tests that fail are all one node Ratis and the most probable cause is that with client retrying a different node each time, increases the chance of success on a three-node ring.

      Attachments

        1. RATIS-592.01.patch
          2 kB
          Siddharth Wagle
        2. RATIS-592.02.patch
          3 kB
          Siddharth Wagle
        3. RATIS-592.03.patch
          4 kB
          Siddharth Wagle
        4. RATIS-592.04.patch
          13 kB
          Siddharth Wagle
        5. RATIS-592.05.patch
          15 kB
          Siddharth Wagle
        6. RATIS-592.06.patch
          15 kB
          Siddharth Wagle
        7. RATIS-592.07.patch
          15 kB
          Siddharth Wagle
        8. RATIS-592.08.patch
          12 kB
          Siddharth Wagle
        9. RATIS-592.09.patch
          12 kB
          Siddharth Wagle

        Issue Links

          Activity

            People

              swagle Siddharth Wagle
              swagle Siddharth Wagle
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: