Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5208

Forked breakpad process blocks indefinitely for WaitForContinueSignal and fails new Impalad process at startup

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.7.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:
      None
    • Epic Color:
      ghx-label-4

      Description

      New Impala process failing to start

      E0414 10:17:56.761270 893048 logging.cc:121] stderr will be logged to this file.
      E0414 10:17:59.897265 893215 thrift-server.cc:182] ThriftServer 'backend' (on port: 22000) exited due to TException: Could not bind: Transport endpoint is not connected
      E0414 10:17:59.897356 893048 thrift-server.cc:171] ThriftServer 'backend' (on port: 22000) did not start correctly
      F0414 10:17:59.899677 893048 impalad-main.cc:89] ThriftServer 'backend' (on port: 22000) did not start correctly
      . Impalad exiting.
      

      Call stack from hung breakpad fork

      (gdb) bt
      #0  0x0000000001b80c9f in google_breakpad::ExceptionHandler::WaitForContinueSignal() ()
      #1  0x0000000001b80ddd in google_breakpad::ExceptionHandler::ThreadEntry(void*) ()
      #2  0x0000000001b805db in google_breakpad::ExceptionHandler::GenerateDump(google_breakpad::ExceptionHandler::CrashContext*) ()
      #3  0x0000000000000000 in ?? ()
      

      PS output

       
      ps -e --format='pid ppid pgid user args' | grep impala
       383619       1  383612 impala   python2.7 /usr/lib64/cmf/agent/build/env/bin/cmf-redactor /usr/lib64/cmf/service/impala/impala.sh impalad impalad_flags false
       405348  368389  405348 mmokhtar python /opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.14/bin/../lib/impala-shell/impala_shell.py -k
       852304  852233  852304 mmokhtar python /opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.14/bin/../lib/impala-shell/impala_shell.py
       872925       1  383612 impala   /opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.14/lib/impala/sbin-retail/impalad --flagfile=/run/cloudera-scm-agent/process/60723-impala-IMPALAD/impala-conf/impalad_flags
       880656  852233  880656 mmokhtar python /opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.14/bin/../lib/impala-shell/impala_shell.py -k
       881074  852233  881074 mmokhtar python /opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.14/bin/../lib/impala-shell/impala_shell.py -k -i va1335.halxg.cloudera.com
       883949  874541  883948 mmokhtar grep --color=auto impala
      

        Issue Links

          Activity

          Hide
          lv Lars Volker added a comment -

          This looks like Breakpad Bug #728. It is the third time I see this mentioned in the past weeks. The chance to trigger this looks rather small to me, so I wonder why it suddenly shows up more frequently.

          There's a change out for review upstream, but it hasn't found a reviewer yet: https://chromium-review.googlesource.com/c/464708/

          If this doesn't make progress upstream soon, we should consider adding it to our toolchain as a patch

          Show
          lv Lars Volker added a comment - This looks like Breakpad Bug #728 . It is the third time I see this mentioned in the past weeks. The chance to trigger this looks rather small to me, so I wonder why it suddenly shows up more frequently. There's a change out for review upstream, but it hasn't found a reviewer yet: https://chromium-review.googlesource.com/c/464708/ If this doesn't make progress upstream soon, we should consider adding it to our toolchain as a patch
          Hide
          lv Lars Volker added a comment -

          IMPALA-5187, IMPALA-5208: Bump Breakpad Version, undo IMPALA-3794

          This change switches to a new Breakpad version, which includes fixes for
          Breakpad bugs #681 and #728. The toolchain change was reviewed here:
          https://gerrit.cloudera.org/6866

          The change also undoes the workaround introduced in IMPALA-3794.

          In addition to running test_breakpad.py in a loop for a while, I tested
          Then I verified that the test fails with the old toolchain version
          (88e5b2) and works with the new one (ffe3e4).

          To test #728 I added a sleep() call before SendContinueSignalToChild()
          and then killed the parent process, manually observing that the child
          would die, too.

          Change-Id: Ic541ccd565f2bb51f68c085747fc47ae8c905d19
          Reviewed-on: http://gerrit.cloudera.org:8080/6883
          Reviewed-by: Lars Volker <lv@cloudera.com>
          Tested-by: Impala Public Jenkins

          Show
          lv Lars Volker added a comment - IMPALA-5187 , IMPALA-5208 : Bump Breakpad Version, undo IMPALA-3794 This change switches to a new Breakpad version, which includes fixes for Breakpad bugs #681 and #728. The toolchain change was reviewed here: https://gerrit.cloudera.org/6866 The change also undoes the workaround introduced in IMPALA-3794 . In addition to running test_breakpad.py in a loop for a while, I tested Then I verified that the test fails with the old toolchain version (88e5b2) and works with the new one (ffe3e4). To test #728 I added a sleep() call before SendContinueSignalToChild() and then killed the parent process, manually observing that the child would die, too. Change-Id: Ic541ccd565f2bb51f68c085747fc47ae8c905d19 Reviewed-on: http://gerrit.cloudera.org:8080/6883 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins

            People

            • Assignee:
              lv Lars Volker
              Reporter:
              mmokhtar Mostafa Mokhtar
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development