[KUDU-2376] SIGSEGV while adding and dropping the same range partition and concurrently writing - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.7.0
Fix Version/s: None
Component/s: None
Labels:
None

Description

While adding a test to https://gerrit.cloudera.org/#/c/9393/, I ran into the problem that writing while doing a replace tablet operation caused the client to segfault. After inspecting the client code, it looked like the same problem could occur if the same range partition was added and dropped with concurrent writes.

Attached is a patch that adds a test to alter_table-test that reliably reproduces the segmentation fault.

I don't totally understand what's happening, but here's what I think I have figured out:

Suppose the range partition P=[0, 100) is dropped and re-added in a single alter. This causes the tablet X for hash bucket 0 and range partition P to be dropped, and a new one Y created for the same partition. There is a batch pending to X which the client attempts to send to each of the replicas of X in turn. Once the replicas are exhausted, the client attempts to find a new leader with MetaCacheServerPicker::PickLeader, which triggers a master lookup to get the latest consensus info for X (#5 in the big comment in PickLeader). This calls LookupTabletByKey, which attempts a fast path lookup. Assuming other metadata operations have already cached a tablet for Y, the tablet for X will have been removed from the by-table-and-by-key map, and the fast lookup with return an entry for Y. The client code doesn't know the difference because the code paths just look at partition boundaries, which match for X and Y. The lookup doesn't happen, and the client ends up in a pretty tight loop repeating the above process, until the segfault.

I'm not sure exactly what the segmentation fault is. I looked at it a bit in gdb and the segfault was a few calls deep into STL maps in release mode and inside a refcount increment in debug mode. I'll try to attach some gdb output showing that later.

The problem is also hinted at in a TODO in PickLeader:

// TODO: When we support tablet splits, we should let the lookup shift
// the write to another tablet (i.e. if it's since been split).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

alter_table-test.patch
25/Mar/18 04:36
3 kB
William Berkeley

Activity

People

Assignee:: Unassigned

Reporter:: William Berkeley

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 25/Mar/18 04:49

Updated:: 25/Mar/18 05:14