Uploaded image for project: 'VCL'
  1. VCL
  2. VCL-839

Problems occur when "localhost" is used for a management node name

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4
    • 2.4.1
    • vcld (backend)
    • None

    Description

      The vcl-install.sh script uses localhost as the name of the management node by default. This FQDN parameter in /etc/vcl/vcld.conf gets set to localhost as well as the managementnode.hostname value.

      The backend code needs to determine the private IP address being used on the management node. This is not stored in the database. Only the management node's hostname and an ambiguous IPaddress values are stored in the management node table. The IPaddress value should be set to the public IP address in order to allow management nodes which don't share the same private network to communicate.

      To determine its own private IP address, the management node attempts to resolve its hostname, localhost, which resolves to 127.0.0.1. After this step, the code compares the resolved IP address to the addresses assigned to the management node's interfaces. The loopback interface's IP addresses are explicitly excluded because there would be no reason for the code to ever use a loopback address.

      This introduces the first problem, which is mostly cosmetic at this point. The following warning is generated:

      |30351|3|3|new|OS.pm:get_private_interface_name|1451| ---- WARNING ----
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| 2015-03-18 14:17:32|30351|3|3|new|OS.pm:get_private_interface_name|1451|failed to determine private interface name, no interface is assigned the private IP address for the reservation: 127.0.0.1
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| : {
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| :   "eth0" => {
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| :     "broadcast_address" => "10.x.x.x",
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| :     "ip_address" => {
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| :       "10.x.x.x" => "255.255.240.0"
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| :     },
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| :     "name" => "eth0",
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| :     "physical_address" => "00:50:56:23:00:bc"
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| :   },
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| :   "eth1" => {
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| :     "broadcast_address" => "x.x.x.x",
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| :     "default_gateway" => "x.x.x.x",
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| :     "ip_address" => {
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| :       "152.46.18.135" => "255.255.248.0"
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| :     },
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| :     "name" => "eth1",
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| :     "physical_address" => "00:50:56:23:00:bd"
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| :   },
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| :   "lo" => {
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| :     "ip_address" => {},
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| :     "name" => "lo"
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| :   }
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| : }
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| ( 0) OS.pm, get_private_interface_name (line: 1451)
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| (-1) OS.pm, get_private_network_configuration (line: 1695)
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| (-2) (eval 762), (eval) (line: 1)
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| (-3) OS.pm, get_ip_address (line: 1846)
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| (-4) OS.pm, get_private_ip_address (line: 1901)
      |30351|3|3|new|OS.pm:get_private_interface_name|1451| (-5) Linux.pm, post_load (line: 418)
      

      The next problem occurs when a computer is being loaded. Linux.pm's post_load subroutine attempts to add firewall rules to allow traffic to any port and and specifically to port 22 from the management node's private IP address. This isn't working as expected because the private IP address could not be determined. The result is the attempt to allow traffic to any port from the management node's private IP address is skipped:

      |30351|3|3|new|Linux.pm:enable_firewall_port|3655| ---- WARNING ----
      |30351|3|3|new|Linux.pm:enable_firewall_port|3655| 2015-03-18 14:22:44|30351|3|3|new|Linux.pm:enable_firewall_port|3655|firewall not modified, port argument is not restricted to a certain port: 'any', scope argument was not sup
      plied, it must be restricted to certain IP addresses if the port argument is unrestricted
      

      The attempt to allow traffic to port 22 is completed. However, because no IP address was specified traffic is allowed from any address. At this point, the management node can still control the computer.

      After the computer is reserved and the user connects, the code attempts to lock down the firewall to the user's remote IP address. Existing firewall rules for the specific connect method port are replaced when a user initially connects:

      2015-03-18 14:23:35|31054|3|3|reserved|Linux.pm:enable_firewall_port|3734|overwrite existing argument specified, existing tcp/22 firewall rule(s) will be replaced:
      |31054|3|3|reserved|Linux.pm:enable_firewall_port|3734| existing scope: 0.0.0.0/0.0.0.0
      |31054|3|3|reserved|Linux.pm:enable_firewall_port|3734| new scope: y.y.y.y/255.255.255.0
      

      y.y.y.y is the user's remote IP address in this example

      Once the firewall is modified, the managment loses control of the computer because the only existing rule which allowed access, 22 from any IP address, was removed. All commands after this point fail.

      2015-03-18 14:23:35|31054|3|3|reserved|utils.pm:run_ssh_command|4181|executing SSH command on 192.168.2.1 (vm241-1): '/sbin/iptables-save > /etc/sysconfig/iptables'
      |31054|3|3|reserved|utils.pm:run_ssh_command|4291| ---- WARNING ----
      |31054|3|3|reserved|utils.pm:run_ssh_command|4291| 2015-03-18 14:23:35|31054|3|3|reserved|utils.pm:run_ssh_command|4291|attempt 1/3: failed to execute SSH command on 192.168.2.1 (vm241-1): '/sbin/iptables-save > /etc/sysconfig/iptables', exit status: 255, output:
      |31054|3|3|reserved|utils.pm:run_ssh_command|4291| ssh output (/sbin/ipta...): ssh: connect to host 192.168.2.1 port 22: No route to host
      

      The user isn't affected at this point. Traffic is still allowed from his/her remote IP address. The management node will continue to check for a user connection every few minutes. It continues to fail to do so. The reservation is not timed out when a management node has no control over the computer.

      Everything is fine for the user as long as he/she does not change location. If they do so and click the Connect button from another remote IP address, the management node won't be able to open the firewall to the new address and the user will not be able to connect.

      User initiated image captures will also fail:

      |6680|3|3|image|OS.pm:pre_capture|102| ---- WARNING ----
      |6680|3|3|image|OS.pm:pre_capture|102| 2015-03-18 14:31:22|6680|3|3|image|OS.pm:pre_capture|102|unable to complete capture preparation tasks, vm241-1 is powered on but not responding to SSH
      |6680|3|3|image|OS.pm:pre_capture|102| ( 0) OS.pm, pre_capture (line: 102)
      |6680|3|3|image|OS.pm:pre_capture|102| (-1) Linux.pm, pre_capture (line: 331)
      |6680|3|3|image|OS.pm:pre_capture|102| (-2) VMware.pm, capture (line: 752)
      |6680|3|3|image|OS.pm:pre_capture|102| (-3) image.pm, process (line: 179)
      |6680|3|3|image|OS.pm:pre_capture|102| (-4) vcld, make_new_child (line: 587)
      |6680|3|3|image|OS.pm:pre_capture|102| (-5) vcld, main (line: 348)
      

      One simple fix is to not use localhost for the management node name. Another fix would be to edit /etc/hosts on the management node and set localhost to the private IP address. I'm not sure if this will cause other problems if something relies on localhost being a loopback address.

      Regardless, the problems with the code need to be resolved. A management node should never lock itself out.

      Attachments

        Activity

          People

            arkurth Andrew Kurth
            arkurth Andrew Kurth
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: