Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2634

token_signer-itest can get stuck when the cluster is shutting down while the leader master generates a new TSK

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 1.8.0
    • Fix Version/s: NA
    • Component/s: test
    • Labels:
      None

      Description

      I saw the following thing happen in token_signer-itest:

      1. The test body finishes. The InternalMiniCluster is being shut down as part of cleaning up the test.
      2. The follower masters shut down.
      3. The leader master starts shutting down (Master::Shutdown()). The catalog manager is shutting down the background tasks (CatalogManagerBgTasks::Shutdown(), and so is joining with the bg task thread.
      4. The bg task thread is in the middle of CatalogManagerBgTasks::Run(), where, because of the short TSK rotation times, it detects it needs to generate a new TSK. It calls through to SysCatalogTable::SyncWrite to write the new TSK.
      5. The other two masters are shut down, so SyncWrite blocks forever waiting for the TSK write to replicate.
      6. The test eventually times out because the itest thread is stuck in CatalogManagerBgTasks::Shutdown() waiting for SysCatalogTable::SyncWrite().

      Log of the failing test attached.

        Attachments

        1. token_signer-itest.log
          487 kB
          William Berkeley

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              wdberkeley William Berkeley
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: