Uploaded image for project: 'Ratis'
  1. Ratis
  2. RATIS-695 Improve running in the face of flakey disks
  3. RATIS-692

RaftStorageDirectory.tryLock throws a very deep IOException

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.5.0
    • server

    Description

      Working with our Namazu infrastructure, the first issue I hit when dialing up the faulty I/O injection rate is as follows:

      2019-09-27 14:13:45 ERROR RaftStorageDirectory:336 - Failed to acquire lock on /home/vagrant/test_data/data0_slowed/64656d6f-5261-6674-4772-6f7570313233/in_use.lock. If this storage directory is mounted via NFS, ensure that the appropriate nfs lock services are running.
      java.io.IOException: Input/output error
              at java.io.RandomAccessFile.writeBytes(Native Method)
              at java.io.RandomAccessFile.write(RandomAccessFile.java:512)
              at org.apache.ratis.server.storage.RaftStorageDirectory.tryLock(RaftStorageDirectory.java:327)
              at org.apache.ratis.server.storage.RaftStorageDirectory.lock(RaftStorageDirectory.java:291)
              at org.apache.ratis.server.storage.RaftStorageDirectory.analyzeStorage(RaftStorageDirectory.java:264)
              at org.apache.ratis.server.storage.RaftStorage.analyzeAndRecoverStorage(RaftStorage.java:100)
              at org.apache.ratis.server.storage.RaftStorage.<init>(RaftStorage.java:63)
              at org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:109)
              at org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:110)
              at org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
              at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      Exception in thread "main" java.io.IOException: Input/output error
              at java.io.RandomAccessFile.writeBytes(Native Method)
              at java.io.RandomAccessFile.write(RandomAccessFile.java:512)
              at org.apache.ratis.server.storage.RaftStorageDirectory.tryLock(RaftStorageDirectory.java:327)
              at org.apache.ratis.server.storage.RaftStorageDirectory.lock(RaftStorageDirectory.java:291)
              at org.apache.ratis.server.storage.RaftStorageDirectory.analyzeStorage(RaftStorageDirectory.java:264)
              at org.apache.ratis.server.storage.RaftStorage.analyzeAndRecoverStorage(RaftStorage.java:100)
              at org.apache.ratis.server.storage.RaftStorage.<init>(RaftStorage.java:63)
              at org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:109)
              at org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:110)
              at org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
              at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      

      It looks like the call chain does not re-try anywhere however.

      Attachments

        1. r692_20190928.patch
          3 kB
          Tsz-wo Sze
        2. r692_20191002.patch
          6 kB
          Tsz-wo Sze
        3. r692_20191003.patch
          6 kB
          Tsz-wo Sze

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            szetszwo Tsz-wo Sze
            clayb Clay B.
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment