Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: llap
    • Labels:
      None

      Description

      Running q27 with split-waves 0.9 on 10 nodes x 16 executors, I get 140 mappers reading store_sales, and 5~ more assorted vertices.
      When running the query repeatedly, one would expect good locality, i.e. the same splits (files+stripes) being processed on the same nodes most of the time.
      However, this is only the case for 40-50% of the stripes in my experience. When the query is run 10 times in a row, an average split (file+stripe) is read on ~4 machines. Some are actually read on a different machine every run

      This affects cache hit ratio.
      Understandably in real scenarios we won't get 100% locality, but we should not be getting bad locality in simple cases like this.

        Attachments

          Activity

            People

            • Assignee:
              sseth Siddharth Seth
              Reporter:
              sershe Sergey Shelukhin
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: