[CASSANDRA-789] Add configurable range sizes, paging to hadoop range queries - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Low
Resolution: Fixed
Fix Version/s: 0.6
Component/s: None
Labels:
None

Description

For very large (billions) numbers of keys, the current hardcoded 4096 keys per InputSplit could cause the split generator to OOM, since all splits are held in memory at once. So we want to make 2 changes:

1) make the number of keys configurable*
2) make record reader page instead of assuming it can read all rows into memory at once

Note: going back to specifying number of splits instead of number of keys is bad for two reasons. First, it does not work with the standard hadoop mapred.min.split.size configuration option. Second, it means we have no way of measuring progress in the record reader, since we have no idea how many keys are in the split. If we specify number of keys, then even if we page we know (to within a small margin of error) how many keys to expect, even if we page.

See ~~CASSANDRA-775~~, ~~CASSANDRA-342~~ for background.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

CASSANDRA-789.patch
16/Mar/10 13:18
9 kB
Johan Oskarsson
CASSANDRA-789.patch
17/Mar/10 09:36
9 kB
Johan Oskarsson

Sub-Tasks

Add method to set Hadoop InputSlit sizes

Resolved

Johan Oskarsson

Activity

People

Assignee:: Johan Oskarsson

Reporter:: Jonathan Ellis

Authors:: Johan Oskarsson

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 12/Feb/10 21:32

Updated:: 16/Apr/19 09:33

Resolved:: 17/Mar/10 15:36