Uploaded image for project: 'Apache HAWQ (Retired)'
  1. Apache HAWQ (Retired)
  2. HAWQ-272

Segment status will not be down after killing postmaster process of segment

    XMLWordPrintableJSON

Details

    Description

      At the cluster, if it has QE, and you kill the postmaster pocess of segment(pid=59335), it can also work and the state of the segment in gp_segment_configuration is up.

      ps -ef |grep postgres
        502 59309     1   0 10:07AM ??         0:05.39 /Users/intern/work/code/main/hawq-db-devel/bin/postgres -D /Users/intern/hawq-data-directory/masterdd -i -M master -p 5432 --silent-mode=true
        502 59310 59309   0 10:07AM ??         0:00.38 postgres: port  5432, master logger process
        502 59313 59309   0 10:07AM ??         0:00.16 postgres: port  5432, stats collector process
        502 59314 59309   0 10:07AM ??         0:01.89 postgres: port  5432, writer process
        502 59315 59309   0 10:07AM ??         0:00.27 postgres: port  5432, checkpoint process
        502 59316 59309   0 10:07AM ??         0:00.09 postgres: port  5432, seqserver process
        502 59317 59309   0 10:07AM ??         0:00.29 postgres: port  5432, WAL Send Server process
        502 59318 59309   0 10:07AM ??         0:00.01 postgres: port  5432, DFS Metadata Cache process
        502 59319 59309   0 10:07AM ??         0:10.02 postgres: port  5432, master resource manager
        502 59335     1   0 10:07AM ??         0:12.94 /Users/intern/work/code/main/hawq-db-devel/bin/postgres -D /Users/intern/hawq-data-directory/segmentdd -i -M segment -p 40000 --silent-mode=true
        502 59336 59335   0 10:07AM ??         0:00.61 postgres: port 40000, logger process
        502 59403 59309   0 10:07AM ??         0:02.28 postgres: port  5432, intern intern [local] con11 cmd63 idle [local]
        502 63451 59335   0 10:25AM ??         0:00.12 postgres: port 40000, stats collector process
        502 63452 59335   0 10:25AM ??         0:01.43 postgres: port 40000, writer process
        502 63453 59335   0 10:25AM ??         0:00.20 postgres: port 40000, checkpoint process
        502 63454 59335   0 10:25AM ??         0:03.64 postgres: port 40000, segment resource manager
        502 63966 59335   0 10:27AM ??         0:04.88 postgres: port 40000, intern intern 127.0.0.1(56871) con11 seg0 idle
        502 63967 59335   0 10:27AM ??         0:04.90 postgres: port 40000, intern intern 127.0.0.1(56873) con11 seg1 idle
        502 63968 59335   0 10:27AM ??         0:07.12 postgres: port 40000, intern intern 127.0.0.1(56875) con11 seg2 idle
        502 63969 59335   0 10:27AM ??         0:07.12 postgres: port 40000, intern intern 127.0.0.1(56877) con11 seg3 idle
        502 63970 59335   0 10:27AM ??         0:04.89 postgres: port 40000, intern intern 127.0.0.1(56879) con11 seg4 idle
        502 63971 59335   0 10:27AM ??         0:04.86 postgres: port 40000, intern intern 127.0.0.1(56881) con11 seg5 idle
      
      kill -9 59335
      
      ps -ef |grep postgres
        502 59309     1   0 10:07AM ??         0:05.64 /Users/intern/work/code/main/hawq-db-devel/bin/postgres -D /Users/intern/hawq-data-directory/masterdd -i -M master -p 5432 --silent-mode=true
        502 59310 59309   0 10:07AM ??         0:00.40 postgres: port  5432, master logger process
        502 59313 59309   0 10:07AM ??         0:00.17 postgres: port  5432, stats collector process
        502 59314 59309   0 10:07AM ??         0:02.01 postgres: port  5432, writer process
        502 59315 59309   0 10:07AM ??         0:00.28 postgres: port  5432, checkpoint process
        502 59316 59309   0 10:07AM ??         0:00.09 postgres: port  5432, seqserver process
        502 59317 59309   0 10:07AM ??         0:00.31 postgres: port  5432, WAL Send Server process
        502 59318 59309   0 10:07AM ??         0:00.01 postgres: port  5432, DFS Metadata Cache process
        502 59319 59309   0 10:07AM ??         0:10.64 postgres: port  5432, master resource manager
        502 59336     1   0 10:07AM ??         0:00.64 postgres: port 40000, logger process
        502 59403 59309   0 10:07AM ??         0:02.40 postgres: port  5432, intern intern [local] con11 cmd67 idle [local]
        502 63454     1   0 10:25AM ??         0:03.96 postgres: port 40000, segment resource manager
        502 63966     1   0 10:27AM ??         0:04.96 postgres: port 40000, intern intern 127.0.0.1(56871) con11 seg0 idle
        502 63967     1   0 10:27AM ??         0:04.98 postgres: port 40000, intern intern 127.0.0.1(56873) con11 seg1 idle
        502 63968     1   0 10:27AM ??         0:07.20 postgres: port 40000, intern intern 127.0.0.1(56875) con11 seg2 idle
        502 63969     1   0 10:27AM ??         0:07.21 postgres: port 40000, intern intern 127.0.0.1(56877) con11 seg3 idle
        502 63970     1   0 10:27AM ??         0:04.98 postgres: port 40000, intern intern 127.0.0.1(56879) con11 seg4 idle
        502 63971     1   0 10:27AM ??         0:04.94 postgres: port 40000, intern intern 127.0.0.1(56881) con11 seg5 idle
      

      Then we execute insert sql.

      intern=# select count(*) from b;
        count
      ----------
       41058000
      (1 row)
      
      intern=# insert into b VALUES (1);
      INSERT 0 1
      intern=# select count(*) from b;
        count
      ----------
       41058001
      (1 row)
      intern=# select * from gp_segment_configuration ;
       registration_order | role | status | port  |  hostname  |  address
      --------------------+------+--------+-------+------------+------------
                        0 | m    | u      |  5432 | doli.local | doli.local
                        1 | p    | u      | 40000 | localhost  | 127.0.0.1
      (2 rows)
      

      If your QE is enough to execute the query, it will success. Otherwise it will call postmaster to create QE, and it will find postmaster is not alive and mark it as down.
      The problem is that we should check the postmaster process of the segment live state.

      Attachments

        Activity

          People

            wlin Wen Lin
            doli Dong Li
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: