Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7980

High system CPU time usage (and waste) when runtime filters filter out files

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 3.2.0
    • None
    • None
    • ghx-label-5

    Description

      When running TPC-DS query 1 on scale factor 10,000 (10TB) on a 140-node cluster with replica_preference=remote, we observed really high system CPU usage for some of the scan nodes:

      HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non-
      child: 100.00%
      - BytesRead: 80.50 MB (84408563)
      - ScannerThreadsSysTime: 36m17s
      

      Using

      {perf}

      , we discovered a lot of usage of

      {futex_wait}

      and

      {pthread_cond_wait}

      and so on. (We also used perf to record context switches and cycles.) Interestingly, observing in top saw the really high system CPU usage spike some time into the query.

      We believe what's going on is that we start many ScannerThread instances, which wait first until initial ranges have been issued and then grab data using

      {impala::io::ScanRange::GetNext()}

      . They do this in a loop, and it uses two locks, until the query is done or there are no num_unqueued_files_ left. If num_unqueued_files_ is left above zero, then these threads just loop through two lock acquisitions and nothing else. We believe that this hot loop is eating system CPU aggressively.

      It's a bit interesting that this is exacerbated in the case with more remote reads. Our best guess is that some of the reads take significantly longer in this case, and a single outlier can extend this period of waste.

      Attachments

        Activity

          People

            Unassigned Unassigned
            philip Philip Martin
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: