Uploaded image for project: 'Apache HAWQ (Retired)'
  1. Apache HAWQ (Retired)
  2. HAWQ-1618

Segment panic at workfile_mgr_close_file() when transaction ROLLBACK

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.4.0.0
    • Query Execution
    • None

    Description

      Log:

      
      2018-05-23 15:49:14.843058 UTC,"user","db",p179799,th401824032,"172.31.6.17","6935",2018-05-23 15:47:39 UTC,1260445558,con25148,cmd7,seg21,slice82,,x1260445558,sx1,"ERROR","25M01","*canceling MPP operation*",,,,,,"INSERT INTO ...
      
      2018-05-23 15:49:15.253671 UTC,,,p179799,th0,,,2018-05-23 15:47:39 UTC,0,con25148,cmd7,seg21,slice82,,,,"PANIC","XX000","Unexpected internal error: Segment process r
      
      eceived signal SIGSEGV",,,,,,,0,,,,"1    0x8ce2a3 postgres gp_backtrace + 0xa3
      
      2    0x8ce491 postgres <symbol not found> + 0x8ce491
      
      3    0x7f2d147ae7e0 libpthread.so.0 <symbol not found> + 0x147ae7e0
      
      4    0x91f4ad postgres workfile_mgr_close_file + 0xd
      
      5    0x90bc84 postgres <symbol not found> + 0x90bc84
      
      6    0x4e6b60 postgres AbortTransaction + 0x240
      
      7    0x4e75c5 postgres AbortCurrentTransaction + 0x25
      
      8    0x7ed81a postgres PostgresMain + 0x6ea
      
      9    0x7a0c50 postgres <symbol not found> + 0x7a0c50
      
      10   0x7a3a19 postgres PostmasterMain + 0x759
      
      11   0x4a5309 postgres main + 0x519
      
      12   0x7f2d13cead1d libc.so.6 __libc_start_main + 0xfd
      
      13   0x4a5389 postgres <symbol not found> + 0x4a5389"
      
      

       

      Core stack:

      
      (gdb) bt
      
      #0  0x00007f2d147ae6ab in raise () from libpthread.so.0
      
      #1  0x00000000008ce552 in SafeHandlerForSegvBusIll (postgres_signal_arg=11, processName=<optimized out>) at elog.c:4573
      
      #2  <signal handler called>
      
      #3  *workfile_mgr_close_file* (work_set=0x0, file=0x7f2ce96d2de0, canReportError=canReportError@entry=0 '\000') at workfile_file.c:129
      
      #4  0x000000000090bc84 in *ntuplestore_cleanup* (fNormal=0 '\000', canReportError=0 '\000', ts=0x21f4810) at tuplestorenew.c:654
      
      #5  XCallBack_NTS (event=event@entry=XACT_EVENT_ABORT, nts=nts@entry=0x21f4810) at tuplestorenew.c:674
      
      #6  0x00000000004e6b60 in CallXactCallbacksOnce (event=<optimized out>) at xact.c:3660
      
      #7  AbortTransaction () at xact.c:2871
      
      #8  0x00000000004e75c5 in AbortCurrentTransaction () at xact.c:3377
      
      #9  0x00000000007ed81a in PostgresMain (argc=<optimized out>, argv=<optimized out>, argv@entry=0x182c900, username=0x17ddcd0 "user") at postgres.c:4648
      
      #10 0x00000000007a0c50 in BackendRun (port=0x17cfb10) at postmaster.c:5915
      
      #11 BackendStartup (port=0x17cfb10) at postmaster.c:5484
      
      #12 ServerLoop () at postmaster.c:2163
      
      #13 0x00000000007a3a19 in PostmasterMain (argc=<optimized out>, argv=<optimized out>) at postmaster.c:1454
      
      #14 0x00000000004a5309 in main (argc=9, argv=0x1785d10) at main.c:226
      
      

       

      Repro:

      
      # create test table
      
      drop table if exists testsisc; 
      create table testsisc (i1 int, i2 int, i3 int, i4 int); 
      insert into testsisc select i, i % 1000, i % 100000, i % 75 from generate_series(0,19999) i;
      
      
      drop table if exists to_insert_into; 
      create table to_insert_into as 
      with ctesisc as 
       (select count(i1) as c1,i3 as c2 from testsisc group by i3)
      select t1.c1 as c11, t1.c2 as c12, t2.c1 as c21, t2.c2 as c22
      from ctesisc as t1, ctesisc as t2
      where t1.c1 = t2.c2
      limit 10;
      
      # run a long time query
      
      begin;
      
      set gp_simex_run=on;
      set gp_cte_sharing=on;
      
      insert into to_insert_into
      with ctesisc as 
       (select count(i1) as c1,i3 as c2 from testsisc group by i3)
      select *
      from ctesisc as t1, ctesisc as t2
      where t1.c1 = t2.c2;
      
      commit;
      
      

      Kill one segment process when the second query is running. Then will find panic log in segment log.

       

      Attachments

        Issue Links

          Activity

            People

              hongxu ma Hongxu Ma
              hongxu ma Hongxu Ma
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: