Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Activate standby master failed after create a new database. However, it will success if we do not create a new database even we create a new table and insert data.
1. Create a new database 'gptest'
[gpadmin@test1 ~]$ psql -l List of databases Name | Owner | Encoding | Access privileges -----------+---------+----------+------------------- postgres | gpadmin | UTF8 | template0 | gpadmin | UTF8 | template1 | gpadmin | UTF8 | (3 rows) [gpadmin@test1 ~]$ createdb gptest [gpadmin@test1 ~]$ psql -l List of databases Name | Owner | Encoding | Access privileges -----------+---------+----------+------------------- gptest | gpadmin | UTF8 | postgres | gpadmin | UTF8 | template0 | gpadmin | UTF8 | template1 | gpadmin | UTF8 | (4 rows)
2. Stop HAWQ master
[gpadmin@test1 ~]$ hawq stop master -a 20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-Prepare to do 'hawq stop' 20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-You can find log in: 20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-/home/gpadmin/hawqAdminLogs/hawq_stop_20160613.log 20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-GPHOME is set to: 20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-/data/pulse-agent-data/HAWQ-main-FeatureTest-opt-mutilnodeparallel-wcl/product/hawq/. 20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-Stop hawq with args: ['stop', 'master'] 20160613:20:13:45:068559 hawq_stop:test1:gpadmin-[INFO]:-There are 0 connections to the database 20160613:20:13:45:068559 hawq_stop:test1:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='smart' 20160613:20:13:45:068559 hawq_stop:test1:gpadmin-[INFO]:-Master host=test1 20160613:20:13:45:068559 hawq_stop:test1:gpadmin-[INFO]:-Stop hawq master 20160613:20:13:46:068559 hawq_stop:test1:gpadmin-[INFO]:-Master stopped successfully
3. Activate standby master
[gpadmin@test1 ~]$ ssh test5 'source /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-mutilnodeparallel-wcl/product/hawq/./greenplum_path.sh; hawq activate standby -a' 20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-Prepare to do 'hawq activate' 20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-You can find log in: 20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-/home/gpadmin/hawqAdminLogs/hawq_activate_20160613.log 20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-GPHOME is set to: 20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-/data/pulse-agent-data/HAWQ-main-FeatureTest-opt-mutilnodeparallel-wcl/product/hawq/. 20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-Activate hawq with args: ['activate', 'standby'] 20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-Starting to activate standby master 'test5' 20160613:20:14:15:126841 hawq_activate:test5:gpadmin-[INFO]:-HAWQ master is not running, skip 20160613:20:14:15:126841 hawq_activate:test5:gpadmin-[INFO]:-Stopping all the running segments 20160613:20:14:21:126841 hawq_activate:test5:gpadmin-[INFO]:- 20160613:20:14:21:126841 hawq_activate:test5:gpadmin-[INFO]:-Stopping running standby 20160613:20:14:23:126841 hawq_activate:test5:gpadmin-[INFO]:-Update master host name in hawq-site.xml 20160613:20:14:31:126841 hawq_activate:test5:gpadmin-[INFO]:-GUC hawq_master_address_host already exist in hawq-site.xml Update it with value: test5 20160613:20:14:31:126841 hawq_activate:test5:gpadmin-[INFO]:-Remove current standby from hawq-site.xml 20160613:20:14:39:126841 hawq_activate:test5:gpadmin-[INFO]:-Start master in master only mode
It hangs and can not start master. And the master log is following:
2016-06-13 20:14:40.268022 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","database system was shut down at 2016-06-13 20:02:50 PDT",,,,,,,0,,"xlog.c",6205, 2016-06-13 20:14:40.268112 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","found recovery.conf file indicating standby takeover recovery needed",,,,,,,0,,"xlog.c",5485, 2016-06-13 20:14:40.268131 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","checkpoint record is at 0/1C75EF0",,,,,,,0,,"xlog.c",6304, 2016-06-13 20:14:40.268143 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","redo record is at 0/1C75EF0; undo record is at 0/0; shutdown TRUE",,,,,,,0,,"xlog.c",6338, 2016-06-13 20:14:40.268155 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","next transaction ID: 0/1003; next OID: 16508",,,,,,,0,,"xlog.c",6342, 2016-06-13 20:14:40.268165 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","next MultiXactId: 1; next MultiXactOffset: 0",,,,,,,0,,"xlog.c",6345, 2016-06-13 20:14:40.268176 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","Forcing Crash Recovery for Master Standby takeover",,,,,,,0,,"xlog.c",6389, 2016-06-13 20:14:40.268195 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","standby takeover recovery in progress",,,,,,,0,,"xlog.c",6427, 2016-06-13 20:14:40.268891 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","redo starts at 0/1C75F40",,,,,,,0,,"xlog.c",6523, 2016-06-13 20:14:40.273313 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","record with zero length at 0/2639190",,,,,,,0,,"xlog.c",4110, 2016-06-13 20:14:40.273338 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","redo done at 0/2639140",,,,,,,0,,"xlog.c",6560, 2016-06-13 20:14:40.273352 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","end of transaction log location is 0/2639190",,,,,,,0,,"xlog.c",6582, 2016-06-13 20:14:40.273460 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","standby takeover recovery complete",,,,,,,0,,"xlog.c",5506, 2016-06-13 20:14:40.274904 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","Need to Repair global sequence number 600 so use scanned maximum value 749 ('gp_persistent_relfile_node')",,,,,,,0,,"cdbpersistentstore.c",519, 2016-06-13 20:14:40.275093 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","Finished startup pass 1. Proceeding to startup crash recovery passes 2 and 3.",,,,,,,0,,"xlog.c",6816, 2016-06-13 20:14:40.284820 PDT,,,p127519,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","Finished startup crash recovery pass 2",,,,,,,0,,"xlog.c",6987, 2016-06-13 20:14:40.289053 PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","recovery restart point at 0/1C75F40",,,,,"xlog redo checkpoint: redo 0/1C75F40; undo 0/0; tli 1; xid 0/1003; oid 16508; multi 1; offset 0; shutdown REDO PASS 3 @ 0/1C75F40; LSN 0/1C75F90: prev 0/1C75EF0; xid 0: XLOG - checkpoint: redo 0/1C75F40; undo 0/0; tli 1; xid 0/1003; oid 16508; multi 1; offset 0; shutdown",,0,,"xlog.c",8323, 2016-06-13 20:14:40.291597 PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","recovery restart point at 0/1C763A0",,,,,"xlog redo checkpoint: redo 0/1C763A0; undo 0/0; tli 1; xid 0/1021; oid 16508; multi 1; offset 0; shutdown REDO PASS 3 @ 0/1C763A0; LSN 0/1C763F0: prev 0/1C76370; xid 0: XLOG - checkpoint: redo 0/1C763A0; undo 0/0; tli 1; xid 0/1021; oid 16508; multi 1; offset 0; shutdown",,0,,"xlog.c",8323, 2016-06-13 20:14:40.292625 PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","recovery restart point at 0/1C763F0",,,,,"xlog redo checkpoint: redo 0/1C763F0; undo 0/0; tli 1; xid 0/1021; oid 16508; multi 1; offset 0; shutdown REDO PASS 3 @ 0/1C763F0; LSN 0/1C76440: prev 0/1C763A0; xid 0: XLOG - checkpoint: redo 0/1C763F0; undo 0/0; tli 1; xid 0/1021; oid 16508; multi 1; offset 0; shutdown",,0,,"xlog.c",8323, 2016-06-13 20:14:40.295223 PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","recovery restart point at 0/1C76D90",,,,,"xlog redo checkpoint: redo 0/1C76D90; undo 0/0; tli 1; xid 0/1046; oid 16508; multi 1; offset 0; online REDO PASS 3 @ 0/1C76D90; LSN 0/1C76DE0: prev 0/1C76D60; xid 0: XLOG - checkpoint: redo 0/1C76D90; undo 0/0; tli 1; xid 0/1046; oid 16508; multi 1; offset 0; online",,0,,"xlog.c",8323, 2016-06-13 20:14:40.295618 PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","recovery restart point at 0/1C76DE0",,,,,"xlog redo checkpoint: redo 0/1C76DE0; undo 0/0; tli 1; xid 0/1047; oid 16508; multi 1; offset 0; online REDO PASS 3 @ 0/1C76DE0; LSN 0/1C76E30: prev 0/1C76D90; xid 0: XLOG - checkpoint: redo 0/1C76DE0; undo 0/0; tli 1; xid 0/1047; oid 16508; multi 1; offset 0; online",,0,,"xlog.c",8323, 2016-06-13 20:14:40.306365 PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"FATAL","58P01","could not open relation 1663/16508/1247: No such file or directory","Database directory ""base/16508"" does not exist",,,,"xlog redo newpage: rel 1663/16508/1247; blk 0 REDO PASS 3 @ 0/1C7B7A8; LSN 0/1C83800: prev 0/1C7B360; xid 1052: Heap - newpage: rel 1663/16508/1247; blk 0",,0,,"md.c",1012,"Stack trace: 1 0x87f232 postgres errstart + 0x252 2 0x7ad57a postgres <symbol not found> + 0x7ad57a 3 0x7ad678 postgres mdnblocks + 0x18 4 0x7af3b6 postgres smgrnblocks + 0x16 5 0x4f97e7 postgres XLogReadBuffer + 0x17 6 0x4c1bf7 postgres heap_redo + 0x4e7 7 0x4eb550 postgres <symbol not found> + 0x4eb550 8 0x4f4b65 postgres StartupXLOG_Pass3 + 0x155 9 0x4f6c08 postgres StartupProcessMain + 0x308 10 0x55629d postgres AuxiliaryProcessMain + 0x5bd 11 0x767706 postgres <symbol not found> + 0x767706 12 0x7689ef postgres <symbol not found> + 0x7689ef 13 0x76d7fd postgres <symbol not found> + 0x76d7fd 14 0x76f34e postgres PostmasterMain + 0xc7e 15 0x6c7e9a postgres main + 0x48a 16 0x3e0541ed1d libc.so.6 __libc_start_main + 0xfd 17 0x4a26a1 postgres <symbol not found> + 0x4a26a1 " 2016-06-13 20:14:40.308171 PDT,,,p127516,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","startup pass 3 process (PID 127520) exited with exit code 1",,,,,,,0,,"postmaster.c",4726, 2016-06-13 20:14:40.308203 PDT,,,p127516,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","aborting startup due to startup process failure",,,,,,,0,,"postmaster.c",3912,