[CASSANDRA-6268] Poor performance of Hadoop if any DC is using VNodes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 1.2.13, 2.0.4
Component/s: None
Labels:
None

Description

Some customers are complaining about huge number of splits in Hadoop caused by VNodes. Disabling vnodes only in Hadoop DC does not fix it. Splits are generated from the results of describe_ring, which returns a huge number of ranges anyways, and doesn't take into account that there will be huge number of consecutive ranges residing on the nodes we'd like the M/R job to be run.

The proposed fix:
1. allows for specifying the DC(s) the Hadoop job should be run in (in DSE - defaults to all Hadoop DCs)
2. merges consecutive ranges before generating Hadoop splits, so we don't have artificial range splitting caused by vnodes in the other DCs

For non-DSE users this feature is turned off by default and doesn't change the old behaviour.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

6268-thrift-2.0.txt
06/Dec/13 12:04
149 kB
Piotr Kolaczkowski
6268-thrift-1.2.txt
13/Nov/13 13:50
311 kB
Piotr Kolaczkowski
6268-src-2.0.txt
13/Nov/13 13:44
8 kB
Piotr Kolaczkowski
6268-src-1.2.txt
13/Nov/13 13:50
8 kB
Piotr Kolaczkowski

Activity

People

Assignee:: Piotr Kolaczkowski

Reporter:: Piotr Kolaczkowski

Authors:: Piotr Kolaczkowski

Reviewers:: Jonathan Ellis

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 29/Oct/13 19:47

Updated:: 16/Apr/19 09:32

Resolved:: 06/Dec/13 15:52