Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-915

Bootstrap can fail shortly after an alter-table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Private Beta
    • 0.5.0
    • tablet
    • None

    Description

      I saw a test failure which seems to be due to the following sequence:

      1) Log: REPLICATE 1.8 ALTER_SCHEMA
      2) Log: REPLICATE 1.9 WRITE
      3) Log: COMMIT 1.9 WRITE
      4) TabletMetadata::Flush()
      5) crash (before COMMIT 1.8 ALTER_SCHEMA)

      During bootstrap, we then have an issue that, because we haven't seen a commit message for 1.8, we consider operation 1.9 to be still pending. We are relying on the tablet peer's FlushInFlightsToLogCallback to ensure that we don't flush metadata until the COMMIT message in the log, but that isn't strong enough – we need to actually wait until COMMIT messages are in the log for all prior operations, not just all prior writes. The implementation currently uses MvccManager::WaitForAllInFlightToCommit, but since AlterSchema doesn't use MvccManager, we aren't waiting for it.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tlipcon Todd Lipcon
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: