Uploaded image for project: 'Apache HAWQ'
  1. Apache HAWQ
  2. HAWQ-1573

crash during proc_exit when write message to server log

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Core
    • Labels:
      None

      Description

      Core stack:

      #0 0x00007fba3973bfcb in raise () from /lib64/libpthread.so.0
      #1 0x0000000000975e3f in SafeHandlerForSegvBusIll (processName=0xc6bbbb "Master process", postgres_signal_arg=11)
      at elog.c:4537
      #2 0x0000000000975ffe in StandardHandlerForSigillSigsegvSigbus_OnMainThread (
      processName=0xc6bbbb "Master process", postgres_signal_arg=11) at elog.c:4615
      #3 0x0000000000891804 in CdbProgramErrorHandler (postgres_signal_arg=11) at postgres.c:3609
      #4 <signal handler called>
      #5 0x00007fba3867d431 in __strlen_sse2_pminub () from /lib64/libc.so.6
      #6 0x00000000009729cc in append_string_to_pipe_chunk (buffer=0x7ffe98a042c0,
      input=0x4257c50 <Address 0x4257c50 out of bounds>) at elog.c:2660
      #7 0x0000000000973d12 in write_message_to_server_log (elevel=15, sqlerrcode=0,
      message=0x21b0260 "clean up communication to resource manager now.", detail=0x0, hint=0x0,
      query_text=0x4257c50 <Address 0x4257c50 out of bounds>, cursorpos=0, internalpos=0, internalquery=0x0,
      context=0x0, funcname=0xcc9450 <__func__.29200> "cleanupQD2RMComm", show_funcname=0 '\000',
      filename=0xcc83e0 "rmcomm_QD2RM.c", lineno=460, stacktracesize=21, omit_location=1 '\001', send_alert=0 '\000',
      stacktracearray=0x10c8f38 <errordata+120>, printstack=0 '\000') at elog.c:3246
      #8 0x0000000000973fc4 in send_message_to_server_log (edata=0x10c8ec0 <errordata>) at elog.c:3296
      #9 0x0000000000970b50 in EmitErrorReport () at elog.c:1495
      #10 0x000000000096e589 in errfinish (dummy=0) at elog.c:602
      #11 0x0000000000970a7d in elog_finish (elevel=15, fmt=0xcc85c8 "clean up communication to resource manager now.")
      at elog.c:1464
      #12 0x00000000009d29da in cleanupQD2RMComm () at rmcomm_QD2RM.c:460
      #13 0x0000000000875d73 in proc_exit_prepare (code=1) at ipc.c:240
      #14 0x0000000000875c24 in proc_exit (code=1) at ipc.c:101
      #15 0x000000000096e78d in errfinish (dummy=0) at elog.c:671
      #16 0x00000000008919d2 in ProcessInterrupts () at postgres.c:3693
      #17 0x00000000006eedf6 in ExecProcNode (node=0x425cfd0) at execProcnode.c:862
      #18 0x00000000006e9132 in ExecutePlan (estate=0x425c400, planstate=0x425cfd0, operation=CMD_SELECT, numberTuples=0,
      direction=ForwardScanDirection, dest=0x37d2c00) at execMain.c:3211
      #19 0x00000000006e5cdc in ExecutorRun (queryDesc=0x4258270, direction=ForwardScanDirection, count=0)
      at execMain.c:1214
      #20 0x00000000008991dd in PortalRunSelect (portal=0x3850e50, forward=1 '\001', count=0, dest=0x37d2c00)
      at pquery.c:1737
      

      I find some tips from PostgreSQL:

      commit e1eb7c81192bec3735eed3228202b400f31c8010
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Sat Mar 20 00:58:21 2010 +0000
      
          Clear error_context_stack and debug_query_string at the beginning of proc_exit,
          so that we won't try to attach any context printouts to messages that get
          emitted while exiting.  Per report from Dennis Koegel, the context functions
          won't necessarily work after we've started shutting down the backend, and it
          seems possible that debug_query_string could be pointing at freed storage
          as well.  The context information doesn't seem particularly relevant to
          such messages anyway, so there's little lost by suppressing it.
      
          Back-patch to all supported branches.  I can only demonstrate a crash with
          log_disconnections messages back to 8.1, but the risk seems real in 8.0 and
          before anyway.
      

      I saw Ming LI has backported something by HAWQ-1208, but I don't know why it is not fixed in Greenplum at that time (now it is fixed). We need to patch it to Hawq as well.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                rlei Radar Lei
                Reporter:
                kuien Kuien Liu
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: