Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1454

Spark and MR jobs running without scan locality

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 1.6.0
    • Component/s: client, perf, spark
    • Labels:
      None
    • Target Version/s:

      Description

      Spark (and according to Dan Burkert MR also now) add all of the locations of a tablet as split locations. This makes sense except that the Java client currently always scans the leader replica. So in many cases we schedule a task which is "local" to a follower, and then it ends up having to do a remote scan.

      This makes Spark queries take about twice as long on tables with replicas compared to unreplicated tables, and I think is a regression on the MR side.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                hahao Hao Hao
                Reporter:
                tlipcon Todd Lipcon
              • Votes:
                3 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: