Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5150

Uneven load distribution of work across NUMA nodes

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.6.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:
      None
    • Epic Color:
      ghx-label-9

      Description

      When doing concurrency testing as part of the competitive benchmarking I noticed that it is very difficult to saturate all CPUs @100%
      Below is a snapshot from htop during a concurrency run, state below closely mimics the steady state, note that CPUs 41-60 are less busy compared to 1-20.

      Then I ran the command below which dumps the threads and processor associated with each, reference.
      for i in $(pgrep impalad); do ps -mo pid,tid,fname,user,psr -p $i;done

      From the man page for ps :

      psr        PSR      processor that process is currently assigned to.
      

      The output showed that a large number of threads are running on core 61, not surprisingly the 1K threads are all thrift-server threads, so I am wondering if this is skewing the kernel's ability to evenly distribute the threads across the cores or something.

      I did a followup experiment using by profiling different core ranges on the system :
      Run 80 concurrent queries dominated by shuffle exchange
      Profile cores 01-20 to foo_01-20
      Profile cores 41-60 to foo_41-60
      Results showed that :
      Cores 01-20 had 50% more instructions retired
      Cores 01-20 show significantly more contention on pthread_cond_wait, base::internal::SpinLockDelay and __lll_lock_wait
      Skew is dominated by DataStreamSender
      ScannerThread(s) also show significant skew

        Issue Links

          Activity

          Hide
          tarmstrong Tim Armstrong added a comment -

          Assigned it back to Mostafa to confirm that IMPALA-4923 solves it

          Show
          tarmstrong Tim Armstrong added a comment - Assigned it back to Mostafa to confirm that IMPALA-4923 solves it

            People

            • Assignee:
              mmokhtar Mostafa Mokhtar
              Reporter:
              mmokhtar Mostafa Mokhtar
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development