Uploaded image for project: 'Avro'
  1. Avro
  2. AVRO-1537

Make it easier to set up a multi-language build environment

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.7.8, 1.8.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      It's currently quite tedious to set up an environment in which the Avro test suites for all supported languages can be run, and in which release candidates can be built. This is especially so when we need to test against several different versions of a programming language or VM (e.g. JDK6/JDK7/JDK8, Ruby 1.8.7/1.9.3/2.0/2.1).

      Our shared Hudson server isn't an ideal solution, because it only runs tests on changes that are already committed, and maintenance of the server can't easily be shared across the community.

      I think a Docker image might be a good solution, since it could be set up by one person, shared with all Avro developers, and maintained by the community on an ongoing basis. But other VM solutions (Vagrant, for example?) might work just as well. Suggestions welcome.

      Related resources:

      1. AVRO-1537.patch
        5 kB
        Tom White
      2. AVRO-1537.patch
        4 kB
        Tom White
      3. AVRO-1537.patch
        4 kB
        Tom White
      4. AVRO-1537.patch
        5 kB
        Tom White
      5. AVRO-1537.patch
        5 kB
        Tom White
      6. AVRO-1537.patch
        5 kB
        Tom White
      7. AVRO-1537.patch
        5 kB
        Tom White
      8. AVRO-1537.patch
        5 kB
        Tom White
      9. AVRO-1537.patch
        5 kB
        Doug Cutting
      10. AVRO-1537-2015-01-08.patch
        5 kB
        Niels Basjes
      11. AVRO-1537.patch
        5 kB
        Doug Cutting
      12. AVRO-1537.patch
        5 kB
        Tom White

        Issue Links

          Activity

          Hide
          martinkl Martin Kleppmann added a comment -

          I'm happy to give this a try, but I don't know a lot about Docker, so it'll take me a bit of time to learn the basics. If anyone is more experienced in this area, please jump in.

          Show
          martinkl Martin Kleppmann added a comment - I'm happy to give this a try, but I don't know a lot about Docker, so it'll take me a bit of time to learn the basics. If anyone is more experienced in this area, please jump in.
          Hide
          tomwhite Tom White added a comment -

          I created a Dockerfile that does this. I can run ./build.sh test and ./build.sh dist, however there are a few failures (which I will open JIRAs for).

          Please try it out and send feedback!

          Show
          tomwhite Tom White added a comment - I created a Dockerfile that does this. I can run ./build.sh test and ./build.sh dist , however there are a few failures (which I will open JIRAs for). Please try it out and send feedback!
          Hide
          tomwhite Tom White added a comment -

          Forgot to say that I based the Dockerfile on the one from HBase. In particular, the way that it bundles a JDK and Maven.

          Show
          tomwhite Tom White added a comment - Forgot to say that I based the Dockerfile on the one from HBase. In particular, the way that it bundles a JDK and Maven.
          Hide
          cutting Doug Cutting added a comment -

          Tom, this looks great!

          It'd be nice to get rid of the manual installs.

          Why can't we install java, and maven with apt-get? The versions in Ubuntu 14.04 look fine to me:

          http://packages.ubuntu.com/trusty/maven
          http://packages.ubuntu.com/trusty/openjdk-7-jdk

          As for forrest, might we use wget if the file doesn't exist?

          wget http://archive.apache.org/dist/forrest/0.8/apache-forrest-0.8.tar.gz

          Show
          cutting Doug Cutting added a comment - Tom, this looks great! It'd be nice to get rid of the manual installs. Why can't we install java, and maven with apt-get? The versions in Ubuntu 14.04 look fine to me: http://packages.ubuntu.com/trusty/maven http://packages.ubuntu.com/trusty/openjdk-7-jdk As for forrest, might we use wget if the file doesn't exist? wget http://archive.apache.org/dist/forrest/0.8/apache-forrest-0.8.tar.gz
          Hide
          tomwhite Tom White added a comment -

          Thanks for taking a look, Doug. I've updated the patch with your suggestions. The Dockerfile is based on an OpenJDK 7 image (which uses Ubuntu 14.04), and it installs Maven with apt-get and Forrest using curl, so there are no manual steps now.

          There are two failures in the tests: lang/js (see AVRO-1573) and lang/php (AVRO-1621). I don't think either if these need prevent this from being committed though.

          Interestingly I hit https://bugs.launchpad.net/ubuntu/+bug/1401390, which just popped up, but I added --no-install-recommends to apt-get as a workaround, as suggested in that thread.

          Show
          tomwhite Tom White added a comment - Thanks for taking a look, Doug. I've updated the patch with your suggestions. The Dockerfile is based on an OpenJDK 7 image (which uses Ubuntu 14.04), and it installs Maven with apt-get and Forrest using curl, so there are no manual steps now. There are two failures in the tests: lang/js (see AVRO-1573 ) and lang/php ( AVRO-1621 ). I don't think either if these need prevent this from being committed though. Interestingly I hit https://bugs.launchpad.net/ubuntu/+bug/1401390 , which just popped up, but I added --no-install-recommends to apt-get as a workaround, as suggested in that thread.
          Hide
          nielsbasjes Niels Basjes added a comment -

          I'm just reading up on Docker and just had an Idea:
          What if instead of doing

          # Clone Avro
          RUN git clone http://git.apache.org/avro.git
          RUN svn checkout https://svn.apache.org/repos/asf/avro/trunk/ avro-trunk

          You used something like (warning: I just read about this command; I didn't have time yet to try it out)

          ONBUILD ADD . avro
          

          See: https://docs.docker.com/reference/builder/#onbuild

          That would add your local version of Avro (which will most likely contain the change you're working on) to the container.
          In that scenario you would be able to run all tests for all languages/version on the patched code before submitting it.

          Show
          nielsbasjes Niels Basjes added a comment - I'm just reading up on Docker and just had an Idea: What if instead of doing # Clone Avro RUN git clone http: //git.apache.org/avro.git RUN svn checkout https: //svn.apache.org/repos/asf/avro/trunk/ avro-trunk You used something like (warning: I just read about this command; I didn't have time yet to try it out) ONBUILD ADD . avro See: https://docs.docker.com/reference/builder/#onbuild That would add your local version of Avro (which will most likely contain the change you're working on) to the container. In that scenario you would be able to run all tests for all languages/version on the patched code before submitting it.
          Hide
          nielsbasjes Niels Basjes added a comment -

          Perhaps using this option with "docker run" is better:

           -v $(pwd):/somewhere/avro 
          
          Show
          nielsbasjes Niels Basjes added a comment - Perhaps using this option with "docker run" is better: -v $(pwd):/somewhere/avro
          Hide
          cutting Doug Cutting added a comment -

          This is great. It works for me!

          It would be great if one could easily use this to create tag and branch workspaces in addition to trunk. Can we somehow pass a parameter to indicate this?

          Why check out from both SVN and git? Is that just for exemplary purposes?

          Might we somehow use this to create a release-building script?

          Show
          cutting Doug Cutting added a comment - This is great. It works for me! It would be great if one could easily use this to create tag and branch workspaces in addition to trunk. Can we somehow pass a parameter to indicate this? Why check out from both SVN and git? Is that just for exemplary purposes? Might we somehow use this to create a release-building script?
          Hide
          nielsbasjes Niels Basjes added a comment -

          Hi Tom White, I played around with the last version you uploaded. (Apparently I wasn't looking at the last version when I wrote the previous comment).

          What I changed here locally is that I removed the last 3 lines from your Dockerfile (i.e. the git clone and svn checkout) and changed the build.sh like below.
          Now you can do a simple

          ./build.sh safetest

          and validate the entire build for all languages without any problems of needing to install anything tricky.
          If I want to run a single command manually from within that environment I can simply do

          ./build.sh shell

          and get a shell in the docker container.

          Let me know what you think of this.

          diff --git a/build.sh b/build.sh
          index 06961c0..cff8dad 100755
          --- a/build.sh
          +++ b/build.sh
          @@ -17,12 +17,16 @@
          
           set -e                                           # exit on error
          
          -cd `dirname "$0"`                                # connect to root
          +SOURCEDIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
          +cd ${SOURCEDIR}                                  # connect to root
          
           VERSION=`cat share/VERSION.txt`
          
          +# The cryptic name is to avoid conflicts in case of a shared development system (like jenkins)
          +DOCKER_IMAGE_NAME="avro_build_image_$(md5sum ${SOURCEDIR}/docker/Dockerfile | cut -d' ' -f1)"
          +
           function usage {
          -  echo "Usage: $0 {test|dist|sign|clean}"
          +  echo "Usage: $0 {test|safetest|shell|dist|sign|clean}"
             exit 1
           }
          
          @@ -31,6 +35,13 @@ then
             usage
           fi
          
          +function runindocker {
          +    docker build -t ${DOCKER_IMAGE_NAME} ${SOURCEDIR}/docker
          +    # By mapping the .m2 directory you can do an mvn install from within the container and use the result on your normal system.
          +    # And this also is a significant speedup in subsequent builds because the dependencies are downloaded only once.
          +    docker run --rm=true -t -i -v ${SOURCEDIR}:/root/avro -w /root/avro -v ${HOME}/.m2:/root/.m2 ${DOCKER_IMAGE_NAME} $1 $2 $3 $4 $5 $6
          +}
          +
           set -x                                           # echo commands
          
           for target in "$@"
          @@ -72,7 +83,14 @@ case "$target" in
                   (cd lang/java; mvn package -DskipTests)
                  # run interop rpc test
                   /bin/bash share/test/interop/bin/test_rpc_interop.sh
          +       ;;
          +
          +    safetest)
          +       runindocker ./build.sh test
          +       ;;
          
          +    shell)
          +       runindocker /bin/bash
                  ;;
          
               dist)
          
          
          Show
          nielsbasjes Niels Basjes added a comment - Hi Tom White , I played around with the last version you uploaded. (Apparently I wasn't looking at the last version when I wrote the previous comment). What I changed here locally is that I removed the last 3 lines from your Dockerfile (i.e. the git clone and svn checkout) and changed the build.sh like below. Now you can do a simple ./build.sh safetest and validate the entire build for all languages without any problems of needing to install anything tricky. If I want to run a single command manually from within that environment I can simply do ./build.sh shell and get a shell in the docker container. Let me know what you think of this. diff --git a/build.sh b/build.sh index 06961c0..cff8dad 100755 --- a/build.sh +++ b/build.sh @@ -17,12 +17,16 @@ set -e # exit on error -cd `dirname "$0" ` # connect to root +SOURCEDIR=$( cd "$( dirname " ${BASH_SOURCE[0]} " )" && pwd ) +cd ${SOURCEDIR} # connect to root VERSION=`cat share/VERSION.txt` +# The cryptic name is to avoid conflicts in case of a shared development system (like jenkins) +DOCKER_IMAGE_NAME= "avro_build_image_$(md5sum ${SOURCEDIR}/docker/Dockerfile | cut -d' ' -f1)" + function usage { - echo "Usage: $0 {test|dist|sign|clean}" + echo "Usage: $0 {test|safetest|shell|dist|sign|clean}" exit 1 } @@ -31,6 +35,13 @@ then usage fi +function runindocker { + docker build -t ${DOCKER_IMAGE_NAME} ${SOURCEDIR}/docker + # By mapping the .m2 directory you can do an mvn install from within the container and use the result on your normal system. + # And this also is a significant speedup in subsequent builds because the dependencies are downloaded only once. + docker run --rm= true -t -i -v ${SOURCEDIR}:/root/avro -w /root/avro -v ${HOME}/.m2:/root/.m2 ${DOCKER_IMAGE_NAME} $1 $2 $3 $4 $5 $6 +} + set -x # echo commands for target in "$@" @@ -72,7 +83,14 @@ case "$target" in (cd lang/java; mvn package -DskipTests) # run interop rpc test /bin/bash share/test/interop/bin/test_rpc_interop.sh + ;; + + safetest) + runindocker ./build.sh test + ;; + shell) + runindocker /bin/bash ;; dist)
          Hide
          tomwhite Tom White added a comment -

          > It would be great if one could easily use this to create tag and branch workspaces in addition to trunk. Can we somehow pass a parameter to indicate this?

          After docker run -it avro-build you can do tagging/branching from the command line. If you use the -v option then you can use the same Avro workspace that you have checked out locally, which is probably the most useful option.

          > Why check out from both SVN and git? Is that just for exemplary purposes?

          Yes. You need SVN to run ./build.sh dist since it does an svn export. Lots of people use git for development so I added that. Maybe I should just remove these lines (as Niels suggests too).

          > Might we somehow use this to create a release-building script?

          Definitely, that would be a good thing to aim for. We'd need to fix the tests first.

          Show
          tomwhite Tom White added a comment - > It would be great if one could easily use this to create tag and branch workspaces in addition to trunk. Can we somehow pass a parameter to indicate this? After docker run -it avro-build you can do tagging/branching from the command line. If you use the -v option then you can use the same Avro workspace that you have checked out locally, which is probably the most useful option. > Why check out from both SVN and git? Is that just for exemplary purposes? Yes. You need SVN to run ./build.sh dist since it does an svn export. Lots of people use git for development so I added that. Maybe I should just remove these lines (as Niels suggests too). > Might we somehow use this to create a release-building script? Definitely, that would be a good thing to aim for. We'd need to fix the tests first.
          Hide
          tomwhite Tom White added a comment -

          Thanks for the suggestions Niels. Here are my comments:

          • I don't think ONBUILD adds anything. As far as I can tell its for dependent dockerfiles, which isn't a situation we have here.
          • The name 'safetest' isn't very suggestive. Also, running in docker is orthogonal to build.sh, so all of its arguments are appropriate, not just test. Maybe we need a wrapper script. OTOH, we could delay adding this until we've got more experience of what's useful, since you can just use docker run ....
          • Doing a docker build every time you run docker isn't necessary. Also, I noticed that it can be slow on my machine since apt-get does an update (not sure why this isn't cached by docker).
          • Won't the cryptic docker build name be the same everywhere since you are calculating the MD5 of Dockerfile, which doesn't change?
          • Running docker from the top-level and using -v ${SOURCEDIR}:/root/avro is a good idea. I can change the patch to do that.
          Show
          tomwhite Tom White added a comment - Thanks for the suggestions Niels. Here are my comments: I don't think ONBUILD adds anything. As far as I can tell its for dependent dockerfiles, which isn't a situation we have here. The name 'safetest' isn't very suggestive. Also, running in docker is orthogonal to build.sh, so all of its arguments are appropriate, not just test. Maybe we need a wrapper script. OTOH, we could delay adding this until we've got more experience of what's useful, since you can just use docker run ... . Doing a docker build every time you run docker isn't necessary. Also, I noticed that it can be slow on my machine since apt-get does an update (not sure why this isn't cached by docker). Won't the cryptic docker build name be the same everywhere since you are calculating the MD5 of Dockerfile, which doesn't change? Running docker from the top-level and using -v ${SOURCEDIR}:/root/avro is a good idea. I can change the patch to do that.
          Hide
          tomwhite Tom White added a comment -

          This patch removes the git and SVN checkouts in favour of mounting the local Avro tree under avro.

          Show
          tomwhite Tom White added a comment - This patch removes the git and SVN checkouts in favour of mounting the local Avro tree under avro .
          Hide
          cutting Doug Cutting added a comment -

          I wonder if this might better be in share/docker, or perhaps just as Dockerfile in the project root?

          Show
          cutting Doug Cutting added a comment - I wonder if this might better be in share/docker, or perhaps just as Dockerfile in the project root?
          Hide
          cutting Doug Cutting added a comment -

          It works a lot faster if you also mount your .m2 directory, so it doesn't have to re-download all the java dependencies.

          sudo docker run -it -v ${PWD}:/root/avro -v ${HOME}/.m2:/root/.m2 avro-build
          
          Show
          cutting Doug Cutting added a comment - It works a lot faster if you also mount your .m2 directory, so it doesn't have to re-download all the java dependencies. sudo docker run -it -v ${PWD}:/root/avro -v ${HOME}/.m2:/root/.m2 avro-build
          Hide
          nielsbasjes Niels Basjes added a comment -

          Hi Tom White,
          I verified on my system but running the scipt for the first time takes 'ages' and running it the second time I see the command prompt within two seconds.
          So if you get very different results I suspect a problem on your system.

          I agree that the 'safetest' name is a poor choice. Perhaps simply having the ./build.sh shell is good enough for the first version?

          I also noticed that all files in the avro directory become owned by root because that is the default user inside docker.
          So after exiting the docker environment I could not longer do a mvn clean because the generated files (.class, .jar, etc) are all stored in directories owned by the root user.

          In an attempt to solve this I came up with this.
          I effectively create an additional layer that simply recreates the current user (same username, userid, groupid) in the docker environment.

          function runindocker {
              docker build -t ${DOCKER_IMAGE_NAME} ${SOURCEDIR}/docker
              docker build -t ${DOCKER_IMAGE_NAME}_${USER} - << UserSpecificDocker
              FROM ${DOCKER_IMAGE_NAME}
              RUN groupadd -g $(id -g) ${USER}
              RUN useradd  -g $(id -g) -u $(id -u) -k /root -m ${USER}
              ENV HOME /home/${USER}
          UserSpecificDocker
              # By mapping the .m2 directory you can do an mvn install from within the container and use the result on your normal system.
              # And this also is a significant speedup in subsequent builds because the dependencies are downloaded only once.
              docker run --rm=true -t -i -v ${SOURCEDIR}:/home/${USER}/avro -w /home/${USER}/avro -v ${HOME}/.m2:/home/${USER}/.m2 -u ${USER} ${DOCKER_IMAGE_NAME}_${USER} $1 $2 $3 $4 $5 $6
          }
          
          Show
          nielsbasjes Niels Basjes added a comment - Hi Tom White , I verified on my system but running the scipt for the first time takes 'ages' and running it the second time I see the command prompt within two seconds. So if you get very different results I suspect a problem on your system. I agree that the 'safetest' name is a poor choice. Perhaps simply having the ./build.sh shell is good enough for the first version? I also noticed that all files in the avro directory become owned by root because that is the default user inside docker. So after exiting the docker environment I could not longer do a mvn clean because the generated files (.class, .jar, etc) are all stored in directories owned by the root user. In an attempt to solve this I came up with this. I effectively create an additional layer that simply recreates the current user (same username, userid, groupid) in the docker environment. function runindocker { docker build -t ${DOCKER_IMAGE_NAME} ${SOURCEDIR}/docker docker build -t ${DOCKER_IMAGE_NAME}_${USER} - << UserSpecificDocker FROM ${DOCKER_IMAGE_NAME} RUN groupadd -g $(id -g) ${USER} RUN useradd -g $(id -g) -u $(id -u) -k /root -m ${USER} ENV HOME /home/${USER} UserSpecificDocker # By mapping the .m2 directory you can do an mvn install from within the container and use the result on your normal system. # And this also is a significant speedup in subsequent builds because the dependencies are downloaded only once. docker run --rm= true -t -i -v ${SOURCEDIR}:/home/${USER}/avro -w /home/${USER}/avro -v ${HOME}/.m2:/home/${USER}/.m2 -u ${USER} ${DOCKER_IMAGE_NAME}_${USER} $1 $2 $3 $4 $5 $6 }
          Hide
          tomwhite Tom White added a comment -

          Doug> I wonder if this might better be in share/docker, or perhaps just as Dockerfile in the project root?

          Project root won't work unfortunately, since it will copy all the files in there into the image. I can move it to share/docker though.

          Doug> It works a lot faster if you also mount your .m2 directory

          Good idea.

          Niels> I also noticed that all files in the avro directory become owned by root because that is the default user inside docker.

          Good catch! It seems that there isn't a good solution for this in docker (see e.g. this explanation from Solomon Hykes). I tried your runindocker function, but it failed for me since I got a GID clash. Does running './build.sh clean' from the container fix the problem? Perhaps we can just document the workaround, or we could mention the git/svn checkout (or add them back) to avoid the user-mapping problem entirely.

          Show
          tomwhite Tom White added a comment - Doug> I wonder if this might better be in share/docker, or perhaps just as Dockerfile in the project root? Project root won't work unfortunately, since it will copy all the files in there into the image. I can move it to share/docker though. Doug> It works a lot faster if you also mount your .m2 directory Good idea. Niels> I also noticed that all files in the avro directory become owned by root because that is the default user inside docker. Good catch! It seems that there isn't a good solution for this in docker (see e.g. this explanation from Solomon Hykes ). I tried your runindocker function, but it failed for me since I got a GID clash. Does running './build.sh clean' from the container fix the problem? Perhaps we can just document the workaround, or we could mention the git/svn checkout (or add them back) to avoid the user-mapping problem entirely.
          Hide
          cutting Doug Cutting added a comment -

          I like the idea of not running as root within the container. It'd be great if we can get this to work. File ownership in ~/.m2/repository also won't be fixed unless we can run as the same user, right?

          Show
          cutting Doug Cutting added a comment - I like the idea of not running as root within the container. It'd be great if we can get this to work. File ownership in ~/.m2/repository also won't be fixed unless we can run as the same user, right?
          Hide
          tomwhite Tom White added a comment -

          Here's an updated patch that uses Niels' runindocker function, adapted to be used from build.sh. The idea is that you type './buildsh docker' to build an image and run a container. I'm using boot2docker on the Mac which uses a hard-coded UID and GID, so the script takes that into account. Niels Basjes, does this work for you on Linux?

          I've also moved the Dockerfile to share/docker and mounted the local .m2 directory.

          Show
          tomwhite Tom White added a comment - Here's an updated patch that uses Niels' runindocker function, adapted to be used from build.sh. The idea is that you type './buildsh docker' to build an image and run a container. I'm using boot2docker on the Mac which uses a hard-coded UID and GID, so the script takes that into account. Niels Basjes , does this work for you on Linux? I've also moved the Dockerfile to share/docker and mounted the local .m2 directory.
          Hide
          tomwhite Tom White added a comment -

          New patch to fix an error that was occurring in './build.sh docker' on Mac.

          Show
          tomwhite Tom White added a comment - New patch to fix an error that was occurring in './build.sh docker' on Mac.
          Hide
          tomwhite Tom White added a comment -

          A (hopefully final) patch with updated dependencies for AVRO-1573.

          Show
          tomwhite Tom White added a comment - A (hopefully final) patch with updated dependencies for AVRO-1573 .
          Hide
          nielsbasjes Niels Basjes added a comment -

          Hi Tom White,

          I checked your current patch and I really like it.
          I did a test on CentOS 6.5 x86_64 inside a Virtual Box in Windows 7 and this works really great.
          The first build takes a while and then subsequent runs are within a second on the screen.
          The Java build went perfect and I let it run for a while and it all seems to work fine (until it all stops with the known bug ./build.sh: line 25: node_modules/grunt/bin/grunt: No such file or directory).

          A few feedback points:

          1. Using the naming like avro-build and avro-build-$USER means that there can only be a single images at a time with that name.
            So in the scenario where you want to enhance the image (i.e. edit the Dockerfile) and then a different user on the same system cannot use the build system reliably; there will be conflicts between those two users. I realize this is an extremely unlikely scenario in normal development situations. I'm sure not how unlikely it becomes when we make this docker part of the normal build and run it on a shared CI environment (i.e. Jenkins) where everything runs as the same user.
            I think it is fine to do this form as long as we realize (and preferably document) this caveat. Perhaps something a comment as simple as "The "build.sh docker" environment is intended for use on personal development systems." should suffice.
          2. The Dockerfile does the installation of the various things by means of naming the tool that needs to be installed. Then this docker image will be different if for one of the tools a new version is released. So I was wondering;
            1. What happens if in the future one of those tools creates a regression or a incompatible change in their API? Like the PHP NaN example? Is that a good thing because you're actually building and testing against what end users will be using too? Or is this a bad thing because you have a changing build environment?
            2. What happens if in the future one of those tools is updated and it is an update that is really needed for the build to succeed? How can it be validated/ensured that the image is updated too on the desktop of the developers? Perhaps the project can be enhanced to ensure/enforce minimal versions of the tools that are needed?
          3. I vote you push back the improvements towards HBase where you based the original on (at least make a ticket in HBase Evaluate Docker improvements from AVRO project).
          Show
          nielsbasjes Niels Basjes added a comment - Hi Tom White , I checked your current patch and I really like it. I did a test on CentOS 6.5 x86_64 inside a Virtual Box in Windows 7 and this works really great. The first build takes a while and then subsequent runs are within a second on the screen. The Java build went perfect and I let it run for a while and it all seems to work fine (until it all stops with the known bug ./build.sh: line 25: node_modules/grunt/bin/grunt: No such file or directory ). A few feedback points: Using the naming like avro-build and avro-build-$USER means that there can only be a single images at a time with that name. So in the scenario where you want to enhance the image (i.e. edit the Dockerfile) and then a different user on the same system cannot use the build system reliably; there will be conflicts between those two users. I realize this is an extremely unlikely scenario in normal development situations. I'm sure not how unlikely it becomes when we make this docker part of the normal build and run it on a shared CI environment (i.e. Jenkins) where everything runs as the same user. I think it is fine to do this form as long as we realize (and preferably document) this caveat. Perhaps something a comment as simple as "The "build.sh docker" environment is intended for use on personal development systems." should suffice. The Dockerfile does the installation of the various things by means of naming the tool that needs to be installed. Then this docker image will be different if for one of the tools a new version is released. So I was wondering; What happens if in the future one of those tools creates a regression or a incompatible change in their API? Like the PHP NaN example? Is that a good thing because you're actually building and testing against what end users will be using too? Or is this a bad thing because you have a changing build environment? What happens if in the future one of those tools is updated and it is an update that is really needed for the build to succeed? How can it be validated/ensured that the image is updated too on the desktop of the developers? Perhaps the project can be enhanced to ensure/enforce minimal versions of the tools that are needed? I vote you push back the improvements towards HBase where you based the original on (at least make a ticket in HBase Evaluate Docker improvements from AVRO project ).
          Hide
          cutting Doug Cutting added a comment -

          On Linux docker requires sudo.

          With the current patch, trying 'sudo ./build.sh docker' on Ubuntu 14.04 fails for me with:

          Step 2 : RUN useradd -g 0 -u 0 -k /root -m root
           ---> Running in 761a21dcc7fb
          useradd: user 'root' already exists
          

          I fixed this by setting $USER to $SUDO_USER on Linux in build.sh:

          if [ "$(uname -s)" == "Linux" ]; then
             USER=${SUDO_USER}
             ...
          

          Then everything works great for me.

          Show
          cutting Doug Cutting added a comment - On Linux docker requires sudo. With the current patch, trying 'sudo ./build.sh docker' on Ubuntu 14.04 fails for me with: Step 2 : RUN useradd -g 0 -u 0 -k /root -m root ---> Running in 761a21dcc7fb useradd: user 'root' already exists I fixed this by setting $USER to $SUDO_USER on Linux in build.sh: if [ "$(uname -s)" == "Linux" ]; then USER=${SUDO_USER} ... Then everything works great for me.
          Hide
          nielsbasjes Niels Basjes added a comment -

          On Linux docker does not require sudo.

          In fact the docker manual says this:
          https://docs.docker.com/installation/ubuntulinux/#giving-non-root-access

          Starting in version 0.5.3, if you (or your Docker installer) create a Unix group called docker and add users to it, then the docker daemon will make the ownership of the Unix socket read/writable by the docker group when the daemon starts. The docker daemon must always run as the root user, but if you run the docker client as a user in the docker group then you don't need to add sudo to all the client commands.

          I always set it up like this (i.e. adding my user to the docker group) and this means that the $SUDO_USER is always unset.

          So to make both situations work I expect we should have something that only overwrites $USER with $SUDO_USER if it was set.

          Show
          nielsbasjes Niels Basjes added a comment - On Linux docker does not require sudo. In fact the docker manual says this: https://docs.docker.com/installation/ubuntulinux/#giving-non-root-access Starting in version 0.5.3, if you (or your Docker installer) create a Unix group called docker and add users to it, then the docker daemon will make the ownership of the Unix socket read/writable by the docker group when the daemon starts. The docker daemon must always run as the root user, but if you run the docker client as a user in the docker group then you don't need to add sudo to all the client commands. I always set it up like this (i.e. adding my user to the docker group) and this means that the $SUDO_USER is always unset. So to make both situations work I expect we should have something that only overwrites $USER with $SUDO_USER if it was set.
          Hide
          nielsbasjes Niels Basjes added a comment -

          I just realized that the original issue states

          This is especially so when we need to test against several different versions of a programming language or VM (e.g. JDK6/JDK7/JDK8, Ruby 1.8.7/1.9.3/2.0/2.1).


          What we have created so far is actually a good way of setting up the 'default' build environment (which is a very good first step).

          Any idea's how we're going to approach the 'real question' ?

          Show
          nielsbasjes Niels Basjes added a comment - I just realized that the original issue states This is especially so when we need to test against several different versions of a programming language or VM (e.g. JDK6/JDK7/JDK8, Ruby 1.8.7/1.9.3/2.0/2.1). What we have created so far is actually a good way of setting up the 'default' build environment (which is a very good first step). Any idea's how we're going to approach the 'real question' ?
          Hide
          tomwhite Tom White added a comment -

          Here's a new patch that sets USER to SUDO_USER if set, otherwise falls back to the current user, as suggested by Niels.

          To answer Niels points:

          • It would be great to use Docker to run on Jenkins, but I'm not sure we can do that at the ASF yet. We can address any issues that arise to do with naming when they arise.
          • The Dockerfile does not specify which versions of each library to use, so you are right that when they are updated we might get failures due to incompatibilities. I propose we address them on a case-by-case basis - since they may uncover bugs that need fixing in the Avro code or tests.
          • Regarding the question about testing different versions of Java, Ruby etc - we could add more Dockerfiles for the different combinations. Separate issues would be best for that.
          Show
          tomwhite Tom White added a comment - Here's a new patch that sets USER to SUDO_USER if set, otherwise falls back to the current user, as suggested by Niels. To answer Niels points: It would be great to use Docker to run on Jenkins, but I'm not sure we can do that at the ASF yet. We can address any issues that arise to do with naming when they arise. The Dockerfile does not specify which versions of each library to use, so you are right that when they are updated we might get failures due to incompatibilities. I propose we address them on a case-by-case basis - since they may uncover bugs that need fixing in the Avro code or tests. Regarding the question about testing different versions of Java, Ruby etc - we could add more Dockerfiles for the different combinations. Separate issues would be best for that.
          Hide
          cutting Doug Cutting added a comment -

          Here's a version that works for me. On Linux with sudo, I need to set USER, not just USER_ID. Is it odious style to alter USER? Should we use a different variable? If so, we need to do so consistently in the script.

          Show
          cutting Doug Cutting added a comment - Here's a version that works for me. On Linux with sudo, I need to set USER, not just USER_ID. Is it odious style to alter USER? Should we use a different variable? If so, we need to do so consistently in the script.
          Hide
          nielsbasjes Niels Basjes added a comment -

          I agree with Doug.
          We should not tamper with a 'system' environment variable if we can avoid it.

          Proposal: In the build.sh we should go something like this:

          USER_NAME=${SUDO_USER:=$USER}
          USER_ID=$(id -u $USER_NAME)
          GROUP_ID=$(id -g $USER_NAME)
          

          and in the rest of the script replace the instances of ${USER} with ${USER_NAME}

          I updated Doug's latest patch, did a clean test run and attached the resulting patch.

          P.S. I did see an error being reported in the csharp area but I think this has nothing to do with this change.

          Show
          nielsbasjes Niels Basjes added a comment - I agree with Doug. We should not tamper with a 'system' environment variable if we can avoid it. Proposal: In the build.sh we should go something like this: USER_NAME=${SUDO_USER:=$USER} USER_ID=$(id -u $USER_NAME) GROUP_ID=$(id -g $USER_NAME) and in the rest of the script replace the instances of ${USER} with ${USER_NAME} I updated Doug's latest patch, did a clean test run and attached the resulting patch. P.S. I did see an error being reported in the csharp area but I think this has nothing to do with this change.
          Hide
          cutting Doug Cutting added a comment -

          Niels, your version works well for me and is a nice improvement.

          Here's a new version that changes whitespace to keep things in 80 columns and also attempts to improve the comments & documentation a bit. No functional changes.

          Show
          cutting Doug Cutting added a comment - Niels, your version works well for me and is a nice improvement. Here's a new version that changes whitespace to keep things in 80 columns and also attempts to improve the comments & documentation a bit. No functional changes.
          Hide
          nielsbasjes Niels Basjes added a comment -

          I ran a complete build from scratch using Doug's latest update and I think we should go with this one.

          So I say: +1 (non-binding)

          I submitted a new issue for the only real error I saw (which I think is not related to this patch): AVRO-1626

          Show
          nielsbasjes Niels Basjes added a comment - I ran a complete build from scratch using Doug's latest update and I think we should go with this one. So I say: +1 (non-binding) I submitted a new issue for the only real error I saw (which I think is not related to this patch): AVRO-1626
          Hide
          tomwhite Tom White added a comment -

          Here's a one-line change to Doug's patch to set USER_NAME in the non-Linux case. With this change, the patch works for me.

          Show
          tomwhite Tom White added a comment - Here's a one-line change to Doug's patch to set USER_NAME in the non-Linux case. With this change, the patch works for me.
          Hide
          cutting Doug Cutting added a comment -

          Tom, this looks great. +1

          Show
          cutting Doug Cutting added a comment - Tom, this looks great. +1
          Hide
          nielsbasjes Niels Basjes added a comment -

          +1 (non-binding)

          Show
          nielsbasjes Niels Basjes added a comment - +1 (non-binding)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1651069 from tomwhite@apache.org in branch 'avro/trunk'
          [ https://svn.apache.org/r1651069 ]

          AVRO-1537. Make it easier to set up a multi-language build environment.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1651069 from tomwhite@apache.org in branch 'avro/trunk' [ https://svn.apache.org/r1651069 ] AVRO-1537 . Make it easier to set up a multi-language build environment.
          Hide
          tomwhite Tom White added a comment -

          I committed this. Thanks for all your help Niels and Doug!

          Show
          tomwhite Tom White added a comment - I committed this. Thanks for all your help Niels and Doug!
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in AvroJava #510 (See https://builds.apache.org/job/AvroJava/510/)
          AVRO-1537. Make it easier to set up a multi-language build environment. (tomwhite: rev 1651069)

          • /avro/trunk/BUILD.txt
          • /avro/trunk/CHANGES.txt
          • /avro/trunk/build.sh
          • /avro/trunk/share/docker
          • /avro/trunk/share/docker/Dockerfile
          • /avro/trunk/share/rat-excludes.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in AvroJava #510 (See https://builds.apache.org/job/AvroJava/510/ ) AVRO-1537 . Make it easier to set up a multi-language build environment. (tomwhite: rev 1651069) /avro/trunk/BUILD.txt /avro/trunk/CHANGES.txt /avro/trunk/build.sh /avro/trunk/share/docker /avro/trunk/share/docker/Dockerfile /avro/trunk/share/rat-excludes.txt

            People

            • Assignee:
              tomwhite Tom White
              Reporter:
              martinkl Martin Kleppmann
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development