Uploaded image for project: 'Ratis'
  1. Ratis
  2. RATIS-1281

TestRetryCacheWithGrpc#testRetryOnResourceUnavailableException fails

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.0.0
    • test
    • None

    Description

      This UT always timeout, even though I increase the time.

      [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 60.201 s <<< FAILURE! - in org.apache.ratis.grpc.TestRetryCacheWithGrpc
      [ERROR] testRetryOnResourceUnavailableException(org.apache.ratis.grpc.TestRetryCacheWithGrpc)  Time elapsed: 60.022 s  <<< ERROR!
      org.junit.runners.model.TestTimedOutException: test timed out after 60000 milliseconds
      	at java.io.FileOutputStream.writeBytes(Native Method)
      	at java.io.FileOutputStream.write(FileOutputStream.java:326)
      	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
      	at java.io.PrintStream.write(PrintStream.java:480)
      	at org.apache.maven.surefire.booter.ForkingRunListener.writeTestOutput(ForkingRunListener.java:208)
      	at org.apache.maven.surefire.report.ConsoleOutputCapture$ForwardingPrintStream.write(ConsoleOutputCapture.java:60)
      	at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
      	at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
      	at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
      	at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
      	at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
      	at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59)
      	at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324)
      	at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
      	at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
      	at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
      	at org.apache.log4j.Category.callAppenders(Category.java:206)
      	at org.apache.log4j.Category.forcedLog(Category.java:391)
      	at org.apache.log4j.Category.log(Category.java:856)
      	at org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:253)
      	at org.apache.ratis.server.impl.RaftServerImpl.submitClientRequestAsync(RaftServerImpl.java:727)
      	at org.apache.ratis.server.impl.RaftServerProxy.lambda$submitClientRequestAsync$9(RaftServerProxy.java:415)
      	at org.apache.ratis.server.impl.RaftServerProxy$$Lambda$678/122031867.apply(Unknown Source)
      	at org.apache.ratis.server.impl.RaftServerProxy.lambda$null$7(RaftServerProxy.java:410)
      	at org.apache.ratis.server.impl.RaftServerProxy$$Lambda$603/413337014.get(Unknown Source)
      	at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:117)
      	at org.apache.ratis.server.impl.RaftServerProxy.lambda$submitRequest$8(RaftServerProxy.java:410)
      	at org.apache.ratis.server.impl.RaftServerProxy$$Lambda$601/602116397.apply(Unknown Source)
      	at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981)
      	at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2124)
      	at org.apache.ratis.server.impl.RaftServerProxy.submitRequest(RaftServerProxy.java:409)
      	at org.apache.ratis.server.impl.RaftServerProxy.submitClientRequestAsync(RaftServerProxy.java:415)
      	at org.apache.ratis.grpc.TestRetryCacheWithGrpc.testRetryOnResourceUnavailableException(TestRetryCacheWithGrpc.java:84)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
      	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.lang.Thread.run(Thread.java:748)
      
      In 

      The first while clause in TestRetryCacheWithGrpc#testRetryOnResourceUnavailableException

      while (!failure.get()) {
        long cid = callId;
        r = cluster.newRaftClientRequest(clientId, leaderProxy.getId(), callId++,
            new RaftTestUtil.SimpleMessage("message"));
        CompletableFuture<RaftClientReply> f = leaderProxy.submitClientRequestAsync(r);
        f.exceptionally(e -> {
            if (e.getCause() instanceof ResourceUnavailableException) {
              RetryCacheTestUtil.isFailed(RetryCacheTestUtil.get(leader, clientId, cid));
              failure.set(true);
            }
            return null;
        });
      }
      
      

      There are 2 requests here, and because followers call blockWriteStateMachineData, the first request will be blocked, and ELEMENT_LIMIT_KEY is 1, the second request will fail and receive ResourceUnavailableException.

      After quittng the first while clause. followers call unblockWriteStateMachineData

      In the second while clause

      
      while (failure.get()) {
        try {
        // retry until the request failed with ResourceUnavailableException succeeds.
        leaderProxy.submitClientRequestAsync(r).get();
        } catch (Exception e) {
          failure.set(false);
        }
      }
      
      

      There are 2 cases: 1. If the first request has been done, the 3rd request will be handled, and later requests will hit retrycache, and server will always return success. It couldn't get Exception and enter infinite loop. 2. If the first request hasn't been done, and the 3rd request has been reach the server, it will get   ResourceUnavailableException and set failure true, and quit the loop.

      Mostly the case 1 occurs, and this UT nearly always fails

       

      So i suggest that quit the loop when request handled successfully as the comments say

      // retry until the request failed with ResourceUnavailableException succeeds.

      Attachments

        Issue Links

          Activity

            People

              ferhui Hui Fei
              ferhui Hui Fei
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m