Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-969

Bootstrap may occasionally mis-identify previously flushed updates

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.5.0, 0.6.0, 0.7.0
    • Fix Version/s: 0.8.0
    • Component/s: tablet
    • Labels:
      None

      Description

      tablet_bootstrap has the following TODO:

         if (!FindCopy(flushed_dms_by_drs_id_, target.rs_id(), &last_durable_dms_id)) {
            // if we have no data about this RowSet, then it must have been flushed and
            // then deleted.
            // TODO: how do we avoid a race where we get an update on a rowset before
            // it is persisted? add docs about the ordering of flush.
            return true;
          }
      

      alter_table-randomized-test, when looped in TSAN, seems to fail after around 30 iterations with a sequence like:

      • a compaction enters "duplicating" phase
      • an update arrives, which is duplicated into the old and new rowsets ids
        • the new rowset ID isn't part of the metadata yet
      • we get kill -9ed before we flush the metadata from the compaction

      It seems that we then mis-identify the update to the "new" store as already flushed, which can cause the bootstrap to fail (or maybe cause a missing update).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tlipcon Todd Lipcon
                Reporter:
                tlipcon Todd Lipcon
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: