Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-1621

Docker run networking should be configurable and support bridge network

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.1
    • Component/s: containerization
    • Labels:

      Description

      Currently to easily support running executors in Docker image, we hardcode --net=host into Docker run so slave and executor and reuse the same mechanism to communicate, which is to pass the slave IP/PORT for the framework to respond with it's own hostname and port information back to setup the tunnel.

      We want to see how to abstract this or even get rid of host networking altogether if we have a good way to not rely on it.

        Activity

        Hide
        vinodkone Vinod Kone added a comment -

        commit 1453a477511c8f6f22ff16e3dd13d0532e019c5b
        Author: Timothy Chen <tnachen@apache.org>
        Date: Tue Sep 16 18:29:36 2014 -0700

        Enabled bridge network for Docker Containerizer.

        Review: https://reviews.apache.org/r/25270

        Show
        vinodkone Vinod Kone added a comment - commit 1453a477511c8f6f22ff16e3dd13d0532e019c5b Author: Timothy Chen <tnachen@apache.org> Date: Tue Sep 16 18:29:36 2014 -0700 Enabled bridge network for Docker Containerizer. Review: https://reviews.apache.org/r/25270
        Hide
        tstclair Timothy St. Clair added a comment -

        I'll open up a separate ticket to discuss the API + override conversation.

        Show
        tstclair Timothy St. Clair added a comment - I'll open up a separate ticket to discuss the API + override conversation.
        Hide
        bhuvan Bhuvan Arumugam added a comment -

        Timothy Chen sure. Updated reviewboard thread to remind volunteers to review/submit the patch.

        retaining it in 0.20.1.

        Show
        bhuvan Bhuvan Arumugam added a comment - Timothy Chen sure. Updated reviewboard thread to remind volunteers to review/submit the patch. retaining it in 0.20.1.
        Hide
        tnachen Timothy Chen added a comment -

        Bhuvan Arumugam I think from the reviewboard it seems like there isn't any major comments, so I personally think it can go through 0.20.1 as this seems to be blocking lots of adoption of the Docker + Mesos feature.
        Let me know what you think that also needs to change for the patch, or you think otherwise.

        Show
        tnachen Timothy Chen added a comment - Bhuvan Arumugam I think from the reviewboard it seems like there isn't any major comments, so I personally think it can go through 0.20.1 as this seems to be blocking lots of adoption of the Docker + Mesos feature. Let me know what you think that also needs to change for the patch, or you think otherwise.
        Hide
        stevedomin Steve Domin added a comment -

        Jay Buffington cool, thanks for the answer!

        Show
        stevedomin Steve Domin added a comment - Jay Buffington cool, thanks for the answer!
        Hide
        jaybuff Jay Buffington added a comment -

        Steve Domin the currently release cadence is about one release per month.

        Show
        jaybuff Jay Buffington added a comment - Steve Domin the currently release cadence is about one release per month.
        Hide
        stevedomin Steve Domin added a comment -

        That makes sense. When can we expect 0.21.0 to be released approximately?

        Show
        stevedomin Steve Domin added a comment - That makes sense. When can we expect 0.21.0 to be released approximately?
        Hide
        bhuvan Bhuvan Arumugam added a comment -

        Adding support for --net=bridge is a feature, deserved to be part of 0.21.0. Considering the patch will go through few more iterations of review. This need not be part of minor bug fix release 0.20.1.

        If no one disagree, I'll move it to 0.21.0.

        Show
        bhuvan Bhuvan Arumugam added a comment - Adding support for --net=bridge is a feature, deserved to be part of 0.21.0. Considering the patch will go through few more iterations of review. This need not be part of minor bug fix release 0.20.1. If no one disagree, I'll move it to 0.21.0.
        Hide
        tstclair Timothy St. Clair added a comment -

        That could be pluggable inside of mesos.

        Show
        tstclair Timothy St. Clair added a comment - That could be pluggable inside of mesos.
        Hide
        jaybuff Jay Buffington added a comment -

        What about the day when we want to change the implementation to not use the CLI and use the docker remote API instead?

        Show
        jaybuff Jay Buffington added a comment - What about the day when we want to change the implementation to not use the CLI and use the docker remote API instead?
        Hide
        tstclair Timothy St. Clair added a comment -

        Having a fully managed interface is always useful from programmatic perspective, however after lots of thought...
        I still think it useful to have some form of command line override. Otherwise as the interface to docker changes, we will constantly need to version track and manage vs. creating a pass-through that empowers the user.

        Thoughts?

        Show
        tstclair Timothy St. Clair added a comment - Having a fully managed interface is always useful from programmatic perspective, however after lots of thought... I still think it useful to have some form of command line override. Otherwise as the interface to docker changes, we will constantly need to version track and manage vs. creating a pass-through that empowers the user. Thoughts?
        Hide
        thomas Thomas Hoppe added a comment -

        Thx!

        Show
        thomas Thomas Hoppe added a comment - Thx!
        Hide
        cdoyle Connor Doyle added a comment -

        Thomas Hoppe Here is a link to the associated issue in Marathon if you'd like to follow along: https://github.com/mesosphere/marathon/issues/587

        Show
        cdoyle Connor Doyle added a comment - Thomas Hoppe Here is a link to the associated issue in Marathon if you'd like to follow along: https://github.com/mesosphere/marathon/issues/587
        Hide
        tnachen Timothy Chen added a comment -

        Meghdoot Bhattacharya once the task is finished with an offer all the resources that is associated with the task is automatically reaccounted back to the master, so yes it will be released.

        Show
        tnachen Timothy Chen added a comment - Meghdoot Bhattacharya once the task is finished with an offer all the resources that is associated with the task is automatically reaccounted back to the master, so yes it will be released.
        Hide
        megh Meghdoot Bhattacharya added a comment -

        Timothy, the approach makes sense because we want in general mesos/framework to select the ports from the offer and not docker daemon. One question I had is if the docker tasks are killed, are the ports released back into the offer, or are they lost? This applies to tasks in MesosContainerizer as well.

        Show
        megh Meghdoot Bhattacharya added a comment - Timothy, the approach makes sense because we want in general mesos/framework to select the ports from the offer and not docker daemon. One question I had is if the docker tasks are killed, are the ports released back into the offer, or are they lost? This applies to tasks in MesosContainerizer as well.
        Hide
        tnachen Timothy Chen added a comment -

        Thomas Hoppe glad this solves your problem, it will be added to the next mesos and marathon release.

        Show
        tnachen Timothy Chen added a comment - Thomas Hoppe glad this solves your problem, it will be added to the next mesos and marathon release.
        Hide
        thomas Thomas Hoppe added a comment -

        "or choose to randomly choose ports for the users within the resource offer range for each port the image exposes"

        This is exactly what I need. Marathon could make use of this and provide the chosen port(s) back to the app submitter.

        Show
        thomas Thomas Hoppe added a comment - "or choose to randomly choose ports for the users within the resource offer range for each port the image exposes" This is exactly what I need. Marathon could make use of this and provide the chosen port(s) back to the app submitter.
        Hide
        tnachen Timothy Chen added a comment -

        Want to give a update after discussion with different folks about the design of port mapping.
        Currently with bridge networking mode we must allow the user to expose ports from the container to the host otherwise it's not reachable. Docker has two options to do so: 1) Expose all ports specified in the image (-P) 2) Explicit mapping host port to container port. Technically there is a third option which is to expose just some container ports and let Docker choose what host port to map to.
        The conflicting factor here is that we cannot simply let the users map ports that is not part of the ports resource offer, so -P is not a viable option in this case as we cannot choose what ports are end up being assigned.
        Therefore I'm going for the explicit mapping ports option, and also verify that each host port specified is in range of the ports resource used.
        The cons of doing this is that for users that just submits a docker image through a framework, if the framework doesn't expose information about the ports resource offer it got then the user will not be able to know what ports to explicitly map to.

        This can be mitigated at least by framework developers to help either expose this information, or choose to randomly choose ports for the users within the resource offer range for each port the image exposes.

        The only information that the user will need to know is that ports within the container that it needs to be exposed.

        Show
        tnachen Timothy Chen added a comment - Want to give a update after discussion with different folks about the design of port mapping. Currently with bridge networking mode we must allow the user to expose ports from the container to the host otherwise it's not reachable. Docker has two options to do so: 1) Expose all ports specified in the image (-P) 2) Explicit mapping host port to container port. Technically there is a third option which is to expose just some container ports and let Docker choose what host port to map to. The conflicting factor here is that we cannot simply let the users map ports that is not part of the ports resource offer, so -P is not a viable option in this case as we cannot choose what ports are end up being assigned. Therefore I'm going for the explicit mapping ports option, and also verify that each host port specified is in range of the ports resource used. The cons of doing this is that for users that just submits a docker image through a framework, if the framework doesn't expose information about the ports resource offer it got then the user will not be able to know what ports to explicitly map to. This can be mitigated at least by framework developers to help either expose this information, or choose to randomly choose ports for the users within the resource offer range for each port the image exposes. The only information that the user will need to know is that ports within the container that it needs to be exposed.
        Hide
        tnachen Timothy Chen added a comment -

        I've added a reviewboard with the proto changes about what I think the API changes looks like, please look at it and provide any feedback you want!

        https://reviews.apache.org/r/25270/

        Show
        tnachen Timothy Chen added a comment - I've added a reviewboard with the proto changes about what I think the API changes looks like, please look at it and provide any feedback you want! https://reviews.apache.org/r/25270/
        Hide
        megh Meghdoot Bhattacharya added a comment -

        Correct. Thx

        Show
        megh Meghdoot Bhattacharya added a comment - Correct. Thx
        Hide
        jaybuff Jay Buffington added a comment -
        Show
        jaybuff Jay Buffington added a comment - Timothy Chen I think he is referring to https://github.com/mesosphere/mesos-docker
        Hide
        tnachen Timothy Chen added a comment -

        Meghdoot Bhattacharya can you provide me a link to what is this "mesos-docker" you're referring to? Is it Deimos?

        Show
        tnachen Timothy Chen added a comment - Meghdoot Bhattacharya can you provide me a link to what is this "mesos-docker" you're referring to? Is it Deimos?
        Hide
        megh Meghdoot Bhattacharya added a comment -

        Having network namespace is the norm in docker world. Not having that feature is a major impediment. In fact the host only networking feature was added very recently in docker to support some special use cases and that is more of an exception. In docker world, generally different apps may use the same bind port in the container namespace and rely on the dynamic host port to not have collision. And then service discovery mechanisms use the dynamic port.

        I would like to see support similar to mesosphere's "mesos-docker" executor feature where it used "ports as a resource" from mesos. marathon would take a ports argument and internally the executor validated with docker inspect the expose ports before doing the NAT mapping. In fact if I remember correctly the dynamic ports were also set as environment variables inside the container. Marathon scheduler did the ports assignment from the port resources.

        In general, whether docker0 bridge is used or some other custom bridge, in most cases if slave IP:port is passed and the container passes its private IP and port, there should not be any issue in communication. And I think slave already is binding to all interfaces today (may be wrong).

        If more time is needed for this feature, is it possible to not use host networking when there is no executor specified and it would be good to have the functionality as mentioned in mesos-docker above. Because using custom executor within docker is more of a special case i would think. Most cases will run docker containers like regular tasks.

        Show
        megh Meghdoot Bhattacharya added a comment - Having network namespace is the norm in docker world. Not having that feature is a major impediment. In fact the host only networking feature was added very recently in docker to support some special use cases and that is more of an exception. In docker world, generally different apps may use the same bind port in the container namespace and rely on the dynamic host port to not have collision. And then service discovery mechanisms use the dynamic port. I would like to see support similar to mesosphere's "mesos-docker" executor feature where it used "ports as a resource" from mesos. marathon would take a ports argument and internally the executor validated with docker inspect the expose ports before doing the NAT mapping. In fact if I remember correctly the dynamic ports were also set as environment variables inside the container. Marathon scheduler did the ports assignment from the port resources. In general, whether docker0 bridge is used or some other custom bridge, in most cases if slave IP:port is passed and the container passes its private IP and port, there should not be any issue in communication. And I think slave already is binding to all interfaces today (may be wrong). If more time is needed for this feature, is it possible to not use host networking when there is no executor specified and it would be good to have the functionality as mentioned in mesos-docker above. Because using custom executor within docker is more of a special case i would think. Most cases will run docker containers like regular tasks.
        Hide
        tnachen Timothy Chen added a comment -

        I'm explicitly trying to avoid just args override in general because it makes transitioning to another implementation quite difficult as we no longer know what breaks for people, makes limiting options more hassle, also we need to handle a lot more cases when the user passes in configuration that can break things (--name test).

        Although we have flexibility to allow users to use more docker features as they're ready there are too much cost to pay with that IMO.
        Let me know think if you have other thoughts.

        Show
        tnachen Timothy Chen added a comment - I'm explicitly trying to avoid just args override in general because it makes transitioning to another implementation quite difficult as we no longer know what breaks for people, makes limiting options more hassle, also we need to handle a lot more cases when the user passes in configuration that can break things (--name test). Although we have flexibility to allow users to use more docker features as they're ready there are too much cost to pay with that IMO. Let me know think if you have other thoughts.
        Hide
        tstclair Timothy St. Clair added a comment -

        Ideally yes, but I don't know what will be possible in round 1.
        I also wonder how much we could directly expose the raw api and fill with defaults. It almost seems cleaner then string parsing.

        Show
        tstclair Timothy St. Clair added a comment - Ideally yes, but I don't know what will be possible in round 1. I also wonder how much we could directly expose the raw api and fill with defaults. It almost seems cleaner then string parsing.
        Hide
        tnachen Timothy Chen added a comment -

        When you mention an args override, you're mentioning an override to provide any docker options, not just network?

        Show
        tnachen Timothy Chen added a comment - When you mention an args override, you're mentioning an override to provide any docker options, not just network?
        Hide
        tstclair Timothy St. Clair added a comment - - edited

        Here is a state space: https://github.com/GoogleCloudPlatform/kubernetes/issues/494
        Here is my hope for the future: https://groups.google.com/forum/#!topic/docker-dev/6tt1y9FTWKg

        Either way, we need to enable an args override.

        Show
        tstclair Timothy St. Clair added a comment - - edited Here is a state space: https://github.com/GoogleCloudPlatform/kubernetes/issues/494 Here is my hope for the future: https://groups.google.com/forum/#!topic/docker-dev/6tt1y9FTWKg Either way, we need to enable an args override.

          People

          • Assignee:
            tnachen Timothy Chen
            Reporter:
            tnachen Timothy Chen
            Shepherd:
            Benjamin Hindman
          • Votes:
            8 Vote for this issue
            Watchers:
            15 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development