Uploaded image for project: 'Ratis'
  1. Ratis
  2. RATIS-591

All create log requests RPCs blocked

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • LogService
    • None

    Description

      I was trying out Rajeshbabu's new changes in RATIS-541 using the docker automation, but gave invalid options the first time which caused the workers to exit (divide by zero).

      When I tried to rerun the VerificationTool, I found that the tool got stuck waiting for logs to be created. Getting a thread dump from the active leader of the metadata quorum showed 150+ threads all stuck waiting to get a write lock. However, there are no threads holding the lock that everyone is waiting on which seems to me like a deadlock.

      It seems like we have some kind of bug where we orphan a lock that's still held. This doesn't happen normally - makes me wonder if it can happen when the leader changes? I'll attach the log of the metadata quorum nodes from my local test. However, I bet this could be reproduced with some adequate load.

      Can you take a look into this, Vlad?

      Attachments

        1. master-1.txt
          355 kB
          Josh Elser
        2. master-3.txt
          315 kB
          Josh Elser
        3. master-2.txt
          914 kB
          Josh Elser

        Activity

          People

            vrodionov Vladimir Rodionov
            elserj Josh Elser
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: