[ACCUMULO-2488] Concurrent randomwalk balance check needs refinement - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Test
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.4.4
Fix Version/s: 1.4.5, 1.5.2, 1.6.0
Component/s: test
Labels:
- randomwalk
- test

Description

The check for balanced tablets in the randomwalk Concurrent test too easily fails.

Here is a real-life example from the test for the number of tablets across five tablet servers: 2, 5, 2, 2, 3. (An old unrelated table plays into these totals.) This produces a mean of 2.8. The cluster is considered unbalanced by the test when any server's count differs from the mean by the larger of 1 or the mean divided by 5. In this case, 2.8/5 is less than 1, so the second tablet server fails since it has more than 3.8 tablets. Even a 4 would fail.

Part of the problem in this particular case is that there are so few tablets, and so few tablet servers. The cluster also seems happy to leave these counts as is, as I continue to check it, so the test's definition of unbalanced is too narrow.

The test needs to be refined to detect unbalanced conditions with a statistically decent calculation.

Attachments

Issue Links

relates to

ACCUMULO-2198 Concurrent randomwalk fails with unbalanced servers

Resolved

ACCUMULO-3141 Many RW failures due to balance check

Resolved

ACCUMULO-2673 Random walk balance check is still failing too frequently

Resolved

links to

Review

Activity

People

Assignee:: Bill Havanki

Reporter:: Bill Havanki

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 17/Mar/14 21:11

Updated:: 17/Sep/14 19:51

Resolved:: 18/Mar/14 20:36