Uploaded image for project: 'Apache HAWQ'
  1. Apache HAWQ
  2. HAWQ-812

Activate standby master failed after create a new database

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0.0-incubating
    • None
    • None

    Description

      Activate standby master failed after create a new database. However, it will success if we do not create a new database even we create a new table and insert data.
      1. Create a new database 'gptest'

      [gpadmin@test1 ~]$ psql -l
                       List of databases
         Name    |  Owner  | Encoding | Access privileges
      -----------+---------+----------+-------------------
       postgres  | gpadmin | UTF8     |
       template0 | gpadmin | UTF8     |
       template1 | gpadmin | UTF8     |
      (3 rows)
      
      [gpadmin@test1 ~]$ createdb gptest
      [gpadmin@test1 ~]$ psql -l
                       List of databases
         Name    |  Owner  | Encoding | Access privileges
      -----------+---------+----------+-------------------
       gptest    | gpadmin | UTF8     |
       postgres  | gpadmin | UTF8     |
       template0 | gpadmin | UTF8     |
       template1 | gpadmin | UTF8     |
      (4 rows)
      

      2. Stop HAWQ master

      [gpadmin@test1 ~]$ hawq stop master -a
      20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-Prepare to do 'hawq stop'
      20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-You can find log in:
      20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-/home/gpadmin/hawqAdminLogs/hawq_stop_20160613.log
      20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-GPHOME is set to:
      20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-/data/pulse-agent-data/HAWQ-main-FeatureTest-opt-mutilnodeparallel-wcl/product/hawq/.
      20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-Stop hawq with args: ['stop', 'master']
      20160613:20:13:45:068559 hawq_stop:test1:gpadmin-[INFO]:-There are 0 connections to the database
      20160613:20:13:45:068559 hawq_stop:test1:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='smart'
      20160613:20:13:45:068559 hawq_stop:test1:gpadmin-[INFO]:-Master host=test1
      20160613:20:13:45:068559 hawq_stop:test1:gpadmin-[INFO]:-Stop hawq master
      20160613:20:13:46:068559 hawq_stop:test1:gpadmin-[INFO]:-Master stopped successfully
      

      3. Activate standby master

      [gpadmin@test1 ~]$ ssh test5 'source /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-mutilnodeparallel-wcl/product/hawq/./greenplum_path.sh; hawq activate standby -a'
      20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-Prepare to do 'hawq activate'
      20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-You can find log in:
      20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-/home/gpadmin/hawqAdminLogs/hawq_activate_20160613.log
      20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-GPHOME is set to:
      20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-/data/pulse-agent-data/HAWQ-main-FeatureTest-opt-mutilnodeparallel-wcl/product/hawq/.
      20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-Activate hawq with args: ['activate', 'standby']
      20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-Starting to activate standby master 'test5'
      20160613:20:14:15:126841 hawq_activate:test5:gpadmin-[INFO]:-HAWQ master is not running, skip
      20160613:20:14:15:126841 hawq_activate:test5:gpadmin-[INFO]:-Stopping all the running segments
      20160613:20:14:21:126841 hawq_activate:test5:gpadmin-[INFO]:-
      20160613:20:14:21:126841 hawq_activate:test5:gpadmin-[INFO]:-Stopping running standby
      20160613:20:14:23:126841 hawq_activate:test5:gpadmin-[INFO]:-Update master host name in hawq-site.xml
      20160613:20:14:31:126841 hawq_activate:test5:gpadmin-[INFO]:-GUC hawq_master_address_host already exist in hawq-site.xml
      Update it with value: test5
      20160613:20:14:31:126841 hawq_activate:test5:gpadmin-[INFO]:-Remove current standby from hawq-site.xml
      20160613:20:14:39:126841 hawq_activate:test5:gpadmin-[INFO]:-Start master in master only mode
      
      

      It hangs and can not start master. And the master log is following:

      2016-06-13 20:14:40.268022 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","database system was shut down at 2016-06-13 20:02:50 PDT",,,,,,,0,,"xlog.c",6205,
      2016-06-13 20:14:40.268112 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","found recovery.conf file indicating standby takeover recovery needed",,,,,,,0,,"xlog.c",5485,
      2016-06-13 20:14:40.268131 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","checkpoint record is at 0/1C75EF0",,,,,,,0,,"xlog.c",6304,
      2016-06-13 20:14:40.268143 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","redo record is at 0/1C75EF0; undo record is at 0/0; shutdown TRUE",,,,,,,0,,"xlog.c",6338,
      2016-06-13 20:14:40.268155 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","next transaction ID: 0/1003; next OID: 16508",,,,,,,0,,"xlog.c",6342,
      2016-06-13 20:14:40.268165 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","next MultiXactId: 1; next MultiXactOffset: 0",,,,,,,0,,"xlog.c",6345,
      2016-06-13 20:14:40.268176 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","Forcing Crash Recovery for Master Standby takeover",,,,,,,0,,"xlog.c",6389,
      2016-06-13 20:14:40.268195 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","standby takeover recovery in progress",,,,,,,0,,"xlog.c",6427,
      2016-06-13 20:14:40.268891 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","redo starts at 0/1C75F40",,,,,,,0,,"xlog.c",6523,
      2016-06-13 20:14:40.273313 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","record with zero length at 0/2639190",,,,,,,0,,"xlog.c",4110,
      2016-06-13 20:14:40.273338 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","redo done at 0/2639140",,,,,,,0,,"xlog.c",6560,
      2016-06-13 20:14:40.273352 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","end of transaction log location is 0/2639190",,,,,,,0,,"xlog.c",6582,
      2016-06-13 20:14:40.273460 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","standby takeover recovery complete",,,,,,,0,,"xlog.c",5506,
      2016-06-13 20:14:40.274904 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","Need to Repair global sequence number 600 so use scanned maximum value 749 ('gp_persistent_relfile_node')",,,,,,,0,,"cdbpersistentstore.c",519,
      2016-06-13 20:14:40.275093 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","Finished startup pass 1.  Proceeding to startup crash recovery passes 2 and 3.",,,,,,,0,,"xlog.c",6816,
      2016-06-13 20:14:40.284820 PDT,,,p127519,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","Finished startup crash recovery pass 2",,,,,,,0,,"xlog.c",6987,
      2016-06-13 20:14:40.289053 PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","recovery restart point at 0/1C75F40",,,,,"xlog redo checkpoint: redo 0/1C75F40; undo 0/0; tli 1; xid 0/1003; oid 16508; multi 1; offset 0; shutdown
      REDO PASS 3 @ 0/1C75F40; LSN 0/1C75F90: prev 0/1C75EF0; xid 0: XLOG - checkpoint: redo 0/1C75F40; undo 0/0; tli 1; xid 0/1003; oid 16508; multi 1; offset 0; shutdown",,0,,"xlog.c",8323,
      2016-06-13 20:14:40.291597 PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","recovery restart point at 0/1C763A0",,,,,"xlog redo checkpoint: redo 0/1C763A0; undo 0/0; tli 1; xid 0/1021; oid 16508; multi 1; offset 0; shutdown
      REDO PASS 3 @ 0/1C763A0; LSN 0/1C763F0: prev 0/1C76370; xid 0: XLOG - checkpoint: redo 0/1C763A0; undo 0/0; tli 1; xid 0/1021; oid 16508; multi 1; offset 0; shutdown",,0,,"xlog.c",8323,
      2016-06-13 20:14:40.292625 PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","recovery restart point at 0/1C763F0",,,,,"xlog redo checkpoint: redo 0/1C763F0; undo 0/0; tli 1; xid 0/1021; oid 16508; multi 1; offset 0; shutdown
      REDO PASS 3 @ 0/1C763F0; LSN 0/1C76440: prev 0/1C763A0; xid 0: XLOG - checkpoint: redo 0/1C763F0; undo 0/0; tli 1; xid 0/1021; oid 16508; multi 1; offset 0; shutdown",,0,,"xlog.c",8323,
      2016-06-13 20:14:40.295223 PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","recovery restart point at 0/1C76D90",,,,,"xlog redo checkpoint: redo 0/1C76D90; undo 0/0; tli 1; xid 0/1046; oid 16508; multi 1; offset 0; online
      REDO PASS 3 @ 0/1C76D90; LSN 0/1C76DE0: prev 0/1C76D60; xid 0: XLOG - checkpoint: redo 0/1C76D90; undo 0/0; tli 1; xid 0/1046; oid 16508; multi 1; offset 0; online",,0,,"xlog.c",8323,
      2016-06-13 20:14:40.295618 PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","recovery restart point at 0/1C76DE0",,,,,"xlog redo checkpoint: redo 0/1C76DE0; undo 0/0; tli 1; xid 0/1047; oid 16508; multi 1; offset 0; online
      REDO PASS 3 @ 0/1C76DE0; LSN 0/1C76E30: prev 0/1C76D90; xid 0: XLOG - checkpoint: redo 0/1C76DE0; undo 0/0; tli 1; xid 0/1047; oid 16508; multi 1; offset 0; online",,0,,"xlog.c",8323,
      2016-06-13 20:14:40.306365 PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"FATAL","58P01","could not open relation 1663/16508/1247: No such file or directory","Database directory ""base/16508"" does not exist",,,,"xlog redo newpage: rel 1663/16508/1247; blk 0
      REDO PASS 3 @ 0/1C7B7A8; LSN 0/1C83800: prev 0/1C7B360; xid 1052: Heap - newpage: rel 1663/16508/1247; blk 0",,0,,"md.c",1012,"Stack trace:
      1    0x87f232 postgres errstart + 0x252
      2    0x7ad57a postgres <symbol not found> + 0x7ad57a
      3    0x7ad678 postgres mdnblocks + 0x18
      4    0x7af3b6 postgres smgrnblocks + 0x16
      5    0x4f97e7 postgres XLogReadBuffer + 0x17
      6    0x4c1bf7 postgres heap_redo + 0x4e7
      7    0x4eb550 postgres <symbol not found> + 0x4eb550
      8    0x4f4b65 postgres StartupXLOG_Pass3 + 0x155
      9    0x4f6c08 postgres StartupProcessMain + 0x308
      10   0x55629d postgres AuxiliaryProcessMain + 0x5bd
      11   0x767706 postgres <symbol not found> + 0x767706
      12   0x7689ef postgres <symbol not found> + 0x7689ef
      13   0x76d7fd postgres <symbol not found> + 0x76d7fd
      14   0x76f34e postgres PostmasterMain + 0xc7e
      15   0x6c7e9a postgres main + 0x48a
      16   0x3e0541ed1d libc.so.6 __libc_start_main + 0xfd
      17   0x4a26a1 postgres <symbol not found> + 0x4a26a1
      "
      2016-06-13 20:14:40.308171 PDT,,,p127516,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","startup pass 3 process (PID 127520) exited with exit code 1",,,,,,,0,,"postmaster.c",4726,
      2016-06-13 20:14:40.308203 PDT,,,p127516,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","aborting startup due to startup process failure",,,,,,,0,,"postmaster.c",3912,
      

      Attachments

        Issue Links

          Activity

            People

              mli Ming Li
              wcl14 Chunling Wang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: