Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-7062

CI Failure: DistributedLockServiceDUnitTest.testSuspendLockingBlocksUntilNoLocks

    XMLWordPrintableJSON

Details

    Description

      The test testSuspendLockingBlocksUntilNoLocks from class DistributedLockServiceDUnitTest failed twice in CI runs 967 and 969.
      Results for the first failure are available here and for the second one here.
      Archived artifacts for the first failure are available here and for the second one here.

      The issue appears to be a race condition while firing an asynchronous thread on a remote VM through the following code:

      DistributedLockServiceDUnitTest.java
          VM vm1 = getVM(1);
          vm1.invokeAsync(new SerializableRunnable("Lock & unlock in vm1") {
            @Override
            public void run() {
              DistributedLockService service2 = getServiceNamed(name);
              assertThat(service2.lock("lock", -1, -1)).isTrue();
              synchronized (monitor) {
                try {
                  monitor.wait();
                } catch (InterruptedException ex) {
                  out.println("Unexpected InterruptedException");
                  fail("interrupted");
                }
              }
              service2.unlock("lock");
            }
          });
          // Let vm1's thread get the lock and go into wait()
          sleep(100);
      

      If the thread is not launched on the remote VM after sleeping for 100 milliseconds, the test will fail as the thread on the local VM will be able to invoke suspendLocking right away:

      DistributedLockServiceDUnitTest.java
          Thread thread = new Thread(new Runnable() {
            @Override
            public void run() {
              setGot(service.suspendLocking(-1));
              setDone(true);
              service.resumeLocking();
            }
          });
          setGot(false);
          setDone(false);
          thread.start();
      
          // Let thread start, make sure it's blocked in suspendLocking
          sleep(100);
          assertThat(getGot() || getDone())
              .withFailMessage("Before release, got: " + getGot() + ", done: " + getDone()).isFalse();
      

      Increasing the sleep time might help to reduce possible re occurrences of the issue, another option would be to investigate how to make the test wait unti the asynchronous invocation has been started on the remote VM instead of arbitrarily sleeping 100 milliseconds.

      Attachments

        Issue Links

          Activity

            People

              jjramos Juan Ramos
              jjramos Juan Ramos
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m