Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5749

Race in coordinator hits DCHECK on 'num_remaining_backends_ > 0'

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • Impala 2.10.0
    • Impala 2.10.0
    • Backend
    • None

    Description

      Discovered while running 'test_finst_cancel_when_query_complete' in a loop trying to repro a different issue, there's a race in Coordinator::UpdateBackendExecStatus that causes Impala to crash on the 'DCHECK_GT(num_remaining_backends_, 0)'

      The problem is that only the first exec report returned for a particular backend after it has completed is supposed to hit line 992, where we decrease 'num_remaining_backends_'. Per the comments, this is supposed to be ensured by the BackendState::IsDone check on line 945.

      However, the check and the update aren't performed atomically, so you can have a situation where two threads enter UpdateBackendExecStatus at the same time, both check BackendState::IsDone and find it false, and then both proceed to update num_remaining_backends_, with the second one hitting the DCHECK.

      Attachments

        Activity

          People

            twmarshall Thomas Tauber-Marshall
            twmarshall Thomas Tauber-Marshall
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: