[CASSANDRA-1042] ColumnFamilyRecordReader returns duplicate rows - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 0.6.5
Component/s: None
Labels:
- hadoop
- mapreduce

Severity:
Normal

Description

There's a bug in ColumnFamilyRecordReader that appears when processing a single split (which happens in most tests that have small number of rows), and potentially in other cases. When the start and end tokens of the split are equal, duplicate rows can be returned.

Example with 5 rows:
token (start and end) = 53193025635115934196771903670925341736

Tokens returned by first get_range_slices iteration (all 5 rows):
16955237001963240173058271559858726497
40670782773005619916245995581909898190
99079589977253916124855502156832923443
144992942750327304334463589818972416113
166860289390734216023086131251507064403

Tokens returned by next iteration (first token is last token from
previous, end token is unchanged)
16955237001963240173058271559858726497
40670782773005619916245995581909898190

Tokens returned by final iteration (first token is last token from
previous, end token is unchanged)
[] (empty)

In this example, the mapper has processed 7 rows in total, 2 of which
were duplicates.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

1042-0_6.txt
25/Jun/10 23:41
3 kB
Jeremy Hanna
1042-test.txt
10/Jul/10 19:30
5 kB
Jonathan Ellis
1042-v2.txt
19/Jul/10 15:49
8 kB
Jonathan Ellis
cassandra.tar.gz
20/May/10 23:32
3 kB
Jeremy Hanna
Cassandra-1042-0_6-branch.patch.txt
27/May/10 16:29
2 kB
Jeremy Hanna
CASSANDRA-1042-trunk.patch.txt
27/May/10 16:29
5 kB
Jeremy Hanna
duplicate_keys.rtf
01/Jul/10 16:33
0.8 kB
Jeremy Hanna

Activity

People

Assignee:: Jonathan Ellis

Reporter:: Joost Ouwerkerk

Authors:: Jonathan Ellis

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 01/May/10 16:18

Updated:: 16/Apr/19 09:33

Resolved:: 19/Jul/10 19:53