Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 4.1.0, 4.2.0
    • Fix Version/s: 4.1.1, 4.2.0
    • Component/s: Management Server
    • Security Level: Public (Anyone can view this level - this is the default.)

      Description

      Create a VPC
      Create a network in VPC
      Create a VM in network in VPC
      Shut down VM
      Wait awhile

      Observed: Something cleans up the unused network in the VPC, removing the guest network nic from the VPC router.

      Start VM

      Observed: New nic is allocated for VPC router

      List routers

      Observed: Router shows both old and new nics.

      We need to either avoid cleaning up these nics when the aren't used, or avoid showing them whenever a VPC/router is queried for nics.

      As a bonus, I've occasionally seen vpcs fail to be deletable. It seemed to be because some of its nics had no broadcast URI (the removed ones), and a NULL pointer was thrown.

        Activity

        Hide
        Marcus Sorensen added a comment -

        Thanks, so will c1ad3b7974449f457a1cc4e50fe7af260d1c5bf6 be backported to 4.1.1?

        Show
        Marcus Sorensen added a comment - Thanks, so will c1ad3b7974449f457a1cc4e50fe7af260d1c5bf6 be backported to 4.1.1?
        Hide
        ASF subversion and git services added a comment -

        Commit 5cd6d6944a5b80aa401587c90fb0f69863ec4a96 in branch refs/heads/object_store from Min Chen
        [ https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;h=5cd6d69 ]

        CLOUDSTACK-3015: VPC virtual router lists deleted nics

        Show
        ASF subversion and git services added a comment - Commit 5cd6d6944a5b80aa401587c90fb0f69863ec4a96 in branch refs/heads/object_store from Min Chen [ https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;h=5cd6d69 ] CLOUDSTACK-3015 : VPC virtual router lists deleted nics
        Hide
        ASF subversion and git services added a comment -

        Commit e4b98b68da40225be01ba0feb06446fa455f0da1 in branch refs/heads/master-6-17-stable from Min Chen
        [ https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;h=e4b98b6 ]

        CLOUDSTACK-3015: VPC virtual router lists deleted nics

        Show
        ASF subversion and git services added a comment - Commit e4b98b68da40225be01ba0feb06446fa455f0da1 in branch refs/heads/master-6-17-stable from Min Chen [ https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;h=e4b98b6 ] CLOUDSTACK-3015 : VPC virtual router lists deleted nics
        Hide
        ASF subversion and git services added a comment -

        Commit 5cd6d6944a5b80aa401587c90fb0f69863ec4a96 in branch refs/heads/master from Min Chen
        [ https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;h=5cd6d69 ]

        CLOUDSTACK-3015: VPC virtual router lists deleted nics

        Show
        ASF subversion and git services added a comment - Commit 5cd6d6944a5b80aa401587c90fb0f69863ec4a96 in branch refs/heads/master from Min Chen [ https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;h=5cd6d69 ] CLOUDSTACK-3015 : VPC virtual router lists deleted nics
        Hide
        ASF subversion and git services added a comment -

        Commit 11cfc034e0b45cf032c1e9dcfe32021fb73789d5 in branch refs/heads/4.1 from Min Chen
        [ https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;h=11cfc03 ]

        CLOUDSTACK-3015: VPC virtual router lists deleted nics

        Show
        ASF subversion and git services added a comment - Commit 11cfc034e0b45cf032c1e9dcfe32021fb73789d5 in branch refs/heads/4.1 from Min Chen [ https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;h=11cfc03 ] CLOUDSTACK-3015 : VPC virtual router lists deleted nics
        Hide
        Alena Prokharchyk added a comment -

        1) listRouters problem

        Min, can you please take a look at this bug. Its related to the view system you've introduced for the VR. Although the VR's nic is removed, the entry for this nic is still present in the view table, so the API returns it back to the user - bug.

        mysql> select id, ip4_address, removed from nics where instance_id=3;
        ------------------------------------

        id ip4_address removed

        ------------------------------------

        8 169.254.1.46 NULL
        9 10.223.159.12 NULL
        11 10.1.1.1 2013-06-18 22:43:06

        ------------------------------------
        3 rows in set (0.00 sec)

        mysql> select name, ip_address from domain_router_view;
        ---------------------+

        name ip_address

        ---------------------+

        r-3-st 169.254.1.46
        r-3-st 10.223.159.12
        r-3-st 10.1.1.1

        ---------------------+
        3 rows in set (0.00 sec)

        In the VPC the nics on the VR are dynamic, and can change with the time. The view should reflect those changes.

        2) Marcus, the problem with the VPC removal (NPE for the VLAN) is caused by diff problem introduced by commit a49261c3b1f2d52d637f481e0bcff4b0d58d7b56. This commit changed getNicInNetwork in NetworkModelImpl to return nics that were marked as removed. So when we tried to remove the VPC, we've tried to access the removed nic entry. The problem was fixed with c1ad3b7974449f457a1cc4e50fe7af260d1c5bf6

        3) I don't see the problems with the VM start after the network was initially shutdown, in the latest build. After the network is shutdown (remember, shutdown happens only when there are no running user vms in the system), and re-implemented again with the new vlan as a part of new user vm start, both user vm and vr nics were configured with the new vlan.

        Before network shutdown (vlan=2004):

        mysql> select id, ip4_address, removed, broadcast_uri, vm_type from nics where instance_id in (5,6) and network_id=205;
        -----------------------------------------------

        id ip4_address removed broadcast_uri vm_type

        -----------------------------------------------

        14 10.1.1.165 NULL vlan://2004 User
        15 10.1.1.1 NULL vlan://2004 DomainRouter

        -----------------------------------------------

        After network was shutdown (vlan is released; nic from the guest network is marked as removed on the VR):

        mysql> select id, ip4_address, removed, broadcast_uri, vm_type from nics where instance_id in (5,6) and network_id=205;
        -----------------------------------------------------------

        id ip4_address removed broadcast_uri vm_type

        -----------------------------------------------------------

        14 10.1.1.165 NULL NULL User
        15 10.1.1.1 2013-06-18 23:04:12 NULL DomainRouter

        -----------------------------------------------------------

        and implemented again (new vlan=2049 is allocated for both VR and user vm) :

        mysql> select id, ip4_address, removed, broadcast_uri, vm_type from nics where instance_id in (5,6) and network_id=205;
        -----------------------------------------------------------

        id ip4_address removed broadcast_uri vm_type

        -----------------------------------------------------------

        14 10.1.1.165 NULL vlan://2049 User
        15 10.1.1.1 2013-06-18 23:04:12 NULL DomainRouter
        16 10.1.1.1 NULL vlan://2049 DomainRouter

        -----------------------------------------------------------

        So the only one problem that is left - fixing the API response in listRouters command.

        Show
        Alena Prokharchyk added a comment - 1) listRouters problem Min, can you please take a look at this bug. Its related to the view system you've introduced for the VR. Although the VR's nic is removed, the entry for this nic is still present in the view table, so the API returns it back to the user - bug. mysql> select id, ip4_address, removed from nics where instance_id=3; --- ------------- -------------------- id ip4_address removed --- ------------- -------------------- 8 169.254.1.46 NULL 9 10.223.159.12 NULL 11 10.1.1.1 2013-06-18 22:43:06 --- ------------- -------------------- 3 rows in set (0.00 sec) mysql> select name, ip_address from domain_router_view; ------- --------------+ name ip_address ------- --------------+ r-3-st 169.254.1.46 r-3-st 10.223.159.12 r-3-st 10.1.1.1 ------- --------------+ 3 rows in set (0.00 sec) In the VPC the nics on the VR are dynamic, and can change with the time. The view should reflect those changes. 2) Marcus, the problem with the VPC removal (NPE for the VLAN) is caused by diff problem introduced by commit a49261c3b1f2d52d637f481e0bcff4b0d58d7b56. This commit changed getNicInNetwork in NetworkModelImpl to return nics that were marked as removed. So when we tried to remove the VPC, we've tried to access the removed nic entry. The problem was fixed with c1ad3b7974449f457a1cc4e50fe7af260d1c5bf6 3) I don't see the problems with the VM start after the network was initially shutdown, in the latest build. After the network is shutdown (remember, shutdown happens only when there are no running user vms in the system), and re-implemented again with the new vlan as a part of new user vm start, both user vm and vr nics were configured with the new vlan. Before network shutdown (vlan=2004): mysql> select id, ip4_address, removed, broadcast_uri, vm_type from nics where instance_id in (5,6) and network_id=205; --- ----------- ------- ------------- ------------- id ip4_address removed broadcast_uri vm_type --- ----------- ------- ------------- ------------- 14 10.1.1.165 NULL vlan://2004 User 15 10.1.1.1 NULL vlan://2004 DomainRouter --- ----------- ------- ------------- ------------- After network was shutdown (vlan is released; nic from the guest network is marked as removed on the VR): mysql> select id, ip4_address, removed, broadcast_uri, vm_type from nics where instance_id in (5,6) and network_id=205; --- ----------- ------------------- ------------- ------------- id ip4_address removed broadcast_uri vm_type --- ----------- ------------------- ------------- ------------- 14 10.1.1.165 NULL NULL User 15 10.1.1.1 2013-06-18 23:04:12 NULL DomainRouter --- ----------- ------------------- ------------- ------------- and implemented again (new vlan=2049 is allocated for both VR and user vm) : mysql> select id, ip4_address, removed, broadcast_uri, vm_type from nics where instance_id in (5,6) and network_id=205; --- ----------- ------------------- ------------- ------------- id ip4_address removed broadcast_uri vm_type --- ----------- ------------------- ------------- ------------- 14 10.1.1.165 NULL vlan://2049 User 15 10.1.1.1 2013-06-18 23:04:12 NULL DomainRouter 16 10.1.1.1 NULL vlan://2049 DomainRouter --- ----------- ------------------- ------------- ------------- So the only one problem that is left - fixing the API response in listRouters command.
        Hide
        Alena Prokharchyk added a comment -

        Network GC thread shuts down the network if it doesn't have running user vms for some time (network gc thread is controlled by network.gc.wait and network.gc.interval global configs). As a result of network shutdown, the nic associated with the guest network, is being unplugged from the VR (and corresponding nic is being removed). So this is by design.

        So the bug here is - the removed nics should't be returned as a part of listRouters call as well as those nics should never be considered for cleanup once they are removed.

        Show
        Alena Prokharchyk added a comment - Network GC thread shuts down the network if it doesn't have running user vms for some time (network gc thread is controlled by network.gc.wait and network.gc.interval global configs). As a result of network shutdown, the nic associated with the guest network, is being unplugged from the VR (and corresponding nic is being removed). So this is by design. So the bug here is - the removed nics should't be returned as a part of listRouters call as well as those nics should never be considered for cleanup once they are removed.
        Hide
        Marcus Sorensen added a comment -

        I'm escalating this issue, because a side effect keeps vms from starting. If you create a vpc with 1 vm, then stop the vm for a bit, the NIC/network gets cleaned up. Then the VM won't start because it's nic no longer has a broadcast URI.

        Show
        Marcus Sorensen added a comment - I'm escalating this issue, because a side effect keeps vms from starting. If you create a vpc with 1 vm, then stop the vm for a bit, the NIC/network gets cleaned up. Then the VM won't start because it's nic no longer has a broadcast URI.

          People

          • Assignee:
            Min Chen
            Reporter:
            Marcus Sorensen
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development