[KUDU-969] Bootstrap may occasionally mis-identify previously flushed updates - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 0.5.0, 0.6.0, 0.7.0
Fix Version/s: 0.8.0
Component/s: tablet
Labels:
None

Target Version/s:

0.8.0
Code Review:
http://gerrit.cloudera.org:8080/#/c/2333/

Description

tablet_bootstrap has the following TODO:

   if (!FindCopy(flushed_dms_by_drs_id_, target.rs_id(), &last_durable_dms_id)) {
      // if we have no data about this RowSet, then it must have been flushed and
      // then deleted.
      // TODO: how do we avoid a race where we get an update on a rowset before
      // it is persisted? add docs about the ordering of flush.
      return true;
    }

alter_table-randomized-test, when looped in TSAN, seems to fail after around 30 iterations with a sequence like:

a compaction enters "duplicating" phase
an update arrives, which is duplicated into the old and new rowsets ids
- the new rowset ID isn't part of the metadata yet
we get kill -9ed before we flush the metadata from the compaction

It seems that we then mis-identify the update to the "new" store as already flushed, which can cause the bootstrap to fail (or maybe cause a missing update).

Attachments

Issue Links

is related to

KUDU-218 Should exercise case where a duplicated insert was flushed from neither store

Resolved

Activity

People

Assignee:: Todd Lipcon

Reporter:: Todd Lipcon

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 10/Aug/15 19:43

Updated:: 28/Mar/16 17:45

Resolved:: 04/Mar/16 03:51