Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-969

Bootstrap may occasionally mis-identify previously flushed updates

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 0.5.0, 0.6.0, 0.7.0
    • 0.8.0
    • tablet
    • None

    Description

      tablet_bootstrap has the following TODO:

         if (!FindCopy(flushed_dms_by_drs_id_, target.rs_id(), &last_durable_dms_id)) {
            // if we have no data about this RowSet, then it must have been flushed and
            // then deleted.
            // TODO: how do we avoid a race where we get an update on a rowset before
            // it is persisted? add docs about the ordering of flush.
            return true;
          }
      

      alter_table-randomized-test, when looped in TSAN, seems to fail after around 30 iterations with a sequence like:

      • a compaction enters "duplicating" phase
      • an update arrives, which is duplicated into the old and new rowsets ids
        • the new rowset ID isn't part of the metadata yet
      • we get kill -9ed before we flush the metadata from the compaction

      It seems that we then mis-identify the update to the "new" store as already flushed, which can cause the bootstrap to fail (or maybe cause a missing update).

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tlipcon Todd Lipcon
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment