Uploaded image for project: 'Brooklyn'
  1. Brooklyn
  2. BROOKLYN-434

Provisioning of CentOS 7.3 (via jclouds) can leave sshd in bad state

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.10.0
    • None
    • None

    Description

      *Underlying Problem*

      There is a bug in CentOS 7.3 (and RHEL 7.3) related to reloading of the SSH daemon configuration. See:

      *Brooklyn Code calling sshd reload*

      The problems occurs when jclouds (or org.apache.brooklyn.policy.jclouds.os.CreateUserPolicy) modifies /etc/ssh/sshd_config and then calls service sshd reload. This can leave systemd's view of sshd in a bad state. A subsequent call to service sshd reload or service sshd restart will not have the desired effect!

      The jclouds code calls service sshd reload if it is configured to lock-down sshd, which is the default - it will configure PasswordAuthentication no and PermitRootLogin no (see https://github.com/apache/brooklyn-server/blob/rel/apache-brooklyn-0.10.0/locations/jclouds/src/main/java/org/apache/brooklyn/location/jclouds/JcloudsLocation.java#L2293).

      Subsequent use of CreateUserPolicy will also call service sshd reload (see https://github.com/apache/brooklyn-server/blob/rel/apache-brooklyn-0.10.0/locations/jclouds/src/main/java/org/apache/brooklyn/policy/jclouds/os/CreateUserPolicy.java#L159-L164).

      *Observed Behaviour: initial jclouds provisioning*

      The exact behaviour we see on CentOS 7.3 is non-deterministic. Below we attempt to describe what can happen.

      TL;DR: systemd thinks sshd is inactive/dead, but the process is really running.

      On a newly provisioned CentOS 7.3 VM, immediately after we've provisioned it through jclouds...

      Looking at the output from jclouds (having tweaked it so that the stderr isn't lost), it executed:

      exec 3<> /etc/ssh/sshd_config && awk -v TEXT="PasswordAuthentication no
      PermitRootLogin no
      " 'BEGIN {print TEXT}{print}' /etc/ssh/sshd_config >&3
      hash service && service ssh reload || service sshd reload || /etc/init.d/ssh* reload
      

      The relevant stderr is:

      Redirecting to /bin/systemctl reload  sshd.service
      Redirecting to /bin/systemctl status  sshd.service
      

      The status of sshd (according the systemd) is:

      $ service sshd status
      
      Redirecting to /bin/systemctl status  sshd.service
      * sshd.service - OpenSSH server daemon
         Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled; vendor preset: enabled)
         Active: inactive (dead) since Wed 2017-02-08 15:16:18 UTC; 5s ago
           Docs: man:sshd(8)
                 man:sshd_config(5)
        Process: 28941 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS)
        Process: 10346 ExecStart=/usr/sbin/sshd $OPTIONS (code=exited, status=0/SUCCESS)
       Main PID: 10348 (code=exited, status=0/SUCCESS)
         CGroup: /system.slice/sshd.service
                 └─28942 /usr/sbin/sshd
      

      However, the process is really running and we can ssh to it:

      $ cat /var/run/sshd.pid
      28942
      
      $ ps aux | grep "/usr/sbin/sshd"
      root     28942  0.0  0.0  82468  1260 ?        Ss   15:16   0:00 /usr/sbin/sshd
      

      *Observed Behaviour: sshd reload again*

      TL;DR: running reload again fails - the process keeps running, but does not re-load the sshd_config file.

      After running service sshd reload, if the same command is run again we get:

      $ sudo service sshd reload
      Redirecting to /bin/systemctl reload  sshd.service
      Job for sshd.service invalid.
      
      $ echo $?
      1
      
      $ cat /var/run/sshd.pid
      28942
      
      $ ps aux | grep "/usr/sbin/sshd"
      root     28942  0.0  0.0  82468  1260 ?        Ss   15:16   0:00 /usr/sbin/sshd
      
      $ service sshd status
      Redirecting to /bin/systemctl status  sshd.service
      * sshd.service - OpenSSH server daemon
         Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled; vendor preset: enabled)
         Active: inactive (dead) since Wed 2017-02-08 15:16:18 UTC; 1min 9s ago
           Docs: man:sshd(8)
                 man:sshd_config(5)
        Process: 28941 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS)
        Process: 10346 ExecStart=/usr/sbin/sshd $OPTIONS (code=exited, status=0/SUCCESS)
       Main PID: 10348 (code=exited, status=0/SUCCESS)
         CGroup: /system.slice/sshd.service
                 └─28942 /usr/sbin/sshd
      

      In /var/log/messages, we have:

      Feb  8 15:17:23 host-10-104-0-188 systemd: Unit sshd.service cannot be reloaded because it is inactive.
      

      *Observed Behaviour: restart, after a failed reload*

      TL;DR: systemd does not restart the process (same pid) - the existing process is now recognised as running though, but the /etc/ssh/sshd_config has not be reloaded.

      $ sudo service sshd restart
      Redirecting to /bin/systemctl restart  sshd.service
      
      $ echo $?
      0
      
      $ cat /var/run/sshd.pid
      28942
      
      $ ps aux | grep "/usr/sbin/sshd"
      root     28942  0.0  0.0  82468  1260 ?        Ss   15:16   0:00 /usr/sbin/sshd
      
      $ service sshd status
      Redirecting to /bin/systemctl status  sshd.service
      * sshd.service - OpenSSH server daemon
         Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled; vendor preset: enabled)
         Active: active (running) since Wed 2017-02-08 15:21:39 UTC; 5s ago
           Docs: man:sshd(8)
                 man:sshd_config(5)
        Process: 28941 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS)
        Process: 29726 ExecStart=/usr/sbin/sshd $OPTIONS (code=exited, status=0/SUCCESS)
       Main PID: 28942 (sshd)
         CGroup: /system.slice/sshd.service
                 └─28942 /usr/sbin/sshd
      
      Feb 08 15:21:39 qa-machine-en-fcap6ij4ya-7bf systemd[1]: Starting OpenSSH server daemon...
      Feb 08 15:21:39 qa-machine-en-fcap6ij4ya-7bf systemd[1]: Started OpenSSH server daemon.
      

      In /var/log/messages, we have:

      Feb  8 15:21:39 host-10-104-0-188 systemd: Starting OpenSSH server daemon...
      Feb  8 15:21:39 host-10-104-0-188 systemd: Started OpenSSH server daemon.
      

      And in /var/log/secure:

      Feb  8 15:21:39 host-10-104-0-188 sudo:     amp : TTY=pts/0 ; PWD=/home/users/amp ; USER=root ; COMMAND=/usr/sbin/service sshd restart
      Feb  8 15:21:39 host-10-104-0-188 polkitd[497]: Registered Authentication Agent for unix-process:29709:569722 (system bus name :1.29 [/usr/bin/pkttyagent --notify-fd 5 --fallback], object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_GB.UTF-8)
      Feb  8 15:21:39 host-10-104-0-188 polkitd[497]: Unregistered Authentication Agent for unix-process:29709:569722 (system bus name :1.29, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_GB.UTF-8) (disconnected from bus)
      Feb  8 15:21:39 host-10-104-0-188 sshd[29727]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
      Feb  8 15:21:39 host-10-104-0-188 sshd[29727]: error: Bind to port 22 on :: failed: Address already in use.
      Feb  8 15:21:39 host-10-104-0-188 sshd[29727]: fatal: Cannot bind any address.
      

      My wild guess at what has happened here is...
      1. systemd thinks sshd is not running, so did not stop it.
      2. systemd starts another sshd process, which fails because process 28942 is already running and listening on port 22.
      3. systemd looks at the pid file, and finds 28942 written there - that sshd process is running, so it reports sshd as "active (running)".

      The sshd process (28942) has not re-loaded the sshd_config file, so changes will not have taken effect.

      *Observed Behaviour: stop + start, after a failed reload*

      Note that calling service sshd stop; service sshd start after a bad reload will behave in the same way as the service sshd restart.

      *Observed Behaviour: subsequent restart, after failed restart*

      A subsequent call to restart now succeeds as expected. The process is replaced, and the changes made in sshd_config take effect.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              aled.sage Aled Sage
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: