[KUDU-1454] Spark and MR jobs running without scan locality - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 0.8.0
Fix Version/s: 1.6.0
Component/s: client, perf, spark
Labels:
None

Target Version/s:

1.6.0

Description

Spark (and according to danburkert MR also now) add all of the locations of a tablet as split locations. This makes sense except that the Java client currently always scans the leader replica. So in many cases we schedule a task which is "local" to a follower, and then it ends up having to do a remote scan.

This makes Spark queries take about twice as long on tables with replicas compared to unreplicated tables, and I think is a regression on the MR side.

Attachments

Issue Links

is related to

KUDU-1704 Add a new read mode to perform bounded staleness snapshot reads

Resolved

Activity

People

Assignee:: Hao Hao

Reporter:: Todd Lipcon

Votes:: 3 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 13/May/16 00:36

Updated:: 04/Dec/17 19:46

Resolved:: 04/Dec/17 19:46