Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1454

Spark and MR jobs running without scan locality

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 0.8.0
    • 1.6.0
    • client, perf, spark
    • None

    Description

      Spark (and according to danburkert MR also now) add all of the locations of a tablet as split locations. This makes sense except that the Java client currently always scans the leader replica. So in many cases we schedule a task which is "local" to a follower, and then it ends up having to do a remote scan.

      This makes Spark queries take about twice as long on tables with replicas compared to unreplicated tables, and I think is a regression on the MR side.

      Attachments

        Issue Links

          Activity

            People

              hahao Hao Hao
              tlipcon Todd Lipcon
              Votes:
              3 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: