[CASSANDRA-1246] Hadoop output SlicePredicate is slow and doesn't work as intended - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 0.7 beta 1
Component/s: None
Labels:
None

Severity:
Normal

Description

The output SlicePredicate is only used to attempt to check that no data exists in the range that we're going to be writing data. This is

(a) slow, since it performs get_range_slices across the entire key range, meaning we'll hit every node in the cluster if there is no data (which is supposed to be the normal case)
(b) wrong, since it appears to be intended to use keyList.size to allow data in column X to not interfere with an output to column Y, but that is not how get_range_slices works – if you have data (or even a tombstone) in any column, you'll get the key back in your result list. so what you would have to do is scan every key, and check the list of columns returned, which in the case of data actually existing in other columns will be prohibitively slow

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

1246.txt
02/Jul/10 15:34
5 kB
Jonathan Ellis

Activity

People

Assignee:: Jonathan Ellis

Reporter:: Jonathan Ellis

Authors:: Jonathan Ellis

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 02/Jul/10 15:30

Updated:: 16/Apr/19 09:33

Resolved:: 07/Jul/10 13:21