Description
Spark (and according to danburkert MR also now) add all of the locations of a tablet as split locations. This makes sense except that the Java client currently always scans the leader replica. So in many cases we schedule a task which is "local" to a follower, and then it ends up having to do a remote scan.
This makes Spark queries take about twice as long on tables with replicas compared to unreplicated tables, and I think is a regression on the MR side.
Attachments
Issue Links
- is related to
-
KUDU-1704 Add a new read mode to perform bounded staleness snapshot reads
- Resolved