Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
At the cluster, if it has QE, and you kill the postmaster pocess of segment(pid=59335), it can also work and the state of the segment in gp_segment_configuration is up.
ps -ef |grep postgres 502 59309 1 0 10:07AM ?? 0:05.39 /Users/intern/work/code/main/hawq-db-devel/bin/postgres -D /Users/intern/hawq-data-directory/masterdd -i -M master -p 5432 --silent-mode=true 502 59310 59309 0 10:07AM ?? 0:00.38 postgres: port 5432, master logger process 502 59313 59309 0 10:07AM ?? 0:00.16 postgres: port 5432, stats collector process 502 59314 59309 0 10:07AM ?? 0:01.89 postgres: port 5432, writer process 502 59315 59309 0 10:07AM ?? 0:00.27 postgres: port 5432, checkpoint process 502 59316 59309 0 10:07AM ?? 0:00.09 postgres: port 5432, seqserver process 502 59317 59309 0 10:07AM ?? 0:00.29 postgres: port 5432, WAL Send Server process 502 59318 59309 0 10:07AM ?? 0:00.01 postgres: port 5432, DFS Metadata Cache process 502 59319 59309 0 10:07AM ?? 0:10.02 postgres: port 5432, master resource manager 502 59335 1 0 10:07AM ?? 0:12.94 /Users/intern/work/code/main/hawq-db-devel/bin/postgres -D /Users/intern/hawq-data-directory/segmentdd -i -M segment -p 40000 --silent-mode=true 502 59336 59335 0 10:07AM ?? 0:00.61 postgres: port 40000, logger process 502 59403 59309 0 10:07AM ?? 0:02.28 postgres: port 5432, intern intern [local] con11 cmd63 idle [local] 502 63451 59335 0 10:25AM ?? 0:00.12 postgres: port 40000, stats collector process 502 63452 59335 0 10:25AM ?? 0:01.43 postgres: port 40000, writer process 502 63453 59335 0 10:25AM ?? 0:00.20 postgres: port 40000, checkpoint process 502 63454 59335 0 10:25AM ?? 0:03.64 postgres: port 40000, segment resource manager 502 63966 59335 0 10:27AM ?? 0:04.88 postgres: port 40000, intern intern 127.0.0.1(56871) con11 seg0 idle 502 63967 59335 0 10:27AM ?? 0:04.90 postgres: port 40000, intern intern 127.0.0.1(56873) con11 seg1 idle 502 63968 59335 0 10:27AM ?? 0:07.12 postgres: port 40000, intern intern 127.0.0.1(56875) con11 seg2 idle 502 63969 59335 0 10:27AM ?? 0:07.12 postgres: port 40000, intern intern 127.0.0.1(56877) con11 seg3 idle 502 63970 59335 0 10:27AM ?? 0:04.89 postgres: port 40000, intern intern 127.0.0.1(56879) con11 seg4 idle 502 63971 59335 0 10:27AM ?? 0:04.86 postgres: port 40000, intern intern 127.0.0.1(56881) con11 seg5 idle kill -9 59335 ps -ef |grep postgres 502 59309 1 0 10:07AM ?? 0:05.64 /Users/intern/work/code/main/hawq-db-devel/bin/postgres -D /Users/intern/hawq-data-directory/masterdd -i -M master -p 5432 --silent-mode=true 502 59310 59309 0 10:07AM ?? 0:00.40 postgres: port 5432, master logger process 502 59313 59309 0 10:07AM ?? 0:00.17 postgres: port 5432, stats collector process 502 59314 59309 0 10:07AM ?? 0:02.01 postgres: port 5432, writer process 502 59315 59309 0 10:07AM ?? 0:00.28 postgres: port 5432, checkpoint process 502 59316 59309 0 10:07AM ?? 0:00.09 postgres: port 5432, seqserver process 502 59317 59309 0 10:07AM ?? 0:00.31 postgres: port 5432, WAL Send Server process 502 59318 59309 0 10:07AM ?? 0:00.01 postgres: port 5432, DFS Metadata Cache process 502 59319 59309 0 10:07AM ?? 0:10.64 postgres: port 5432, master resource manager 502 59336 1 0 10:07AM ?? 0:00.64 postgres: port 40000, logger process 502 59403 59309 0 10:07AM ?? 0:02.40 postgres: port 5432, intern intern [local] con11 cmd67 idle [local] 502 63454 1 0 10:25AM ?? 0:03.96 postgres: port 40000, segment resource manager 502 63966 1 0 10:27AM ?? 0:04.96 postgres: port 40000, intern intern 127.0.0.1(56871) con11 seg0 idle 502 63967 1 0 10:27AM ?? 0:04.98 postgres: port 40000, intern intern 127.0.0.1(56873) con11 seg1 idle 502 63968 1 0 10:27AM ?? 0:07.20 postgres: port 40000, intern intern 127.0.0.1(56875) con11 seg2 idle 502 63969 1 0 10:27AM ?? 0:07.21 postgres: port 40000, intern intern 127.0.0.1(56877) con11 seg3 idle 502 63970 1 0 10:27AM ?? 0:04.98 postgres: port 40000, intern intern 127.0.0.1(56879) con11 seg4 idle 502 63971 1 0 10:27AM ?? 0:04.94 postgres: port 40000, intern intern 127.0.0.1(56881) con11 seg5 idle
Then we execute insert sql.
intern=# select count(*) from b; count ---------- 41058000 (1 row) intern=# insert into b VALUES (1); INSERT 0 1 intern=# select count(*) from b; count ---------- 41058001 (1 row) intern=# select * from gp_segment_configuration ; registration_order | role | status | port | hostname | address --------------------+------+--------+-------+------------+------------ 0 | m | u | 5432 | doli.local | doli.local 1 | p | u | 40000 | localhost | 127.0.0.1 (2 rows)
If your QE is enough to execute the query, it will success. Otherwise it will call postmaster to create QE, and it will find postmaster is not alive and mark it as down.
The problem is that we should check the postmaster process of the segment live state.