Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Docker image can be more lean if multiple steps are group together and run by a shell script. For example, all the install commands can be wrapped by a setup shell script for Hadoop-runner.
#!/bin/bash rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm yum install -y sudo python2-pip wget nmap-ncat jq java-11-openjdk pip install robotframework wget -O /usr/local/bin/dumb-init https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64 chmod +x /usr/local/bin/dumb-init mkdir -p /etc/security/keytabs && chmod -R a+wr /etc/security/keytabs wget -O /opt/byteman.jar https://repo.maven.apache.org/maven2/org/jboss/byteman/byteman/4.0.4/byteman-4.0.4.jar chmod o+r /opt/byteman.jar mkdir -p /opt/profiler && \ cd /opt/profiler && \ curl -L https://github.com/jvm-profiling-tools/async-profiler/releases/download/v1.5/async-profiler-1.5-linux-x64.tar.gz | tar xvz yum install -y krb5-workstation mkdir -p /etc/hadoop && mkdir -p /var/log/hadoop && chmod 1777 /etc/hadoop && chmod 1777 /var/log/hadoop
And Dockerfile is simplified to:
FROM centos ADD setup.sh / RUN /setup.sh ADD scripts /opt/ ADD scripts/krb5.conf /etc/ WORKDIR /opt/hadoop ENV HADOOP_LOG_DIR=/var/log/hadoop ENV HADOOP_CONF_DIR=/etc/hadoop ENTRYPOINT ["/usr/local/bin/dumb-init", "--", "/opt/starter.sh"]
This arrangement can drastically improve the rebuild performance of Docker image. The end result of the image is 150MB less than current hadoop-runner image on Github. The reduced intermediate layers shrinks the reference count number to improve space usage.
We can also have two scripts, one for install binaries, and another one for configure the image. This can even further reduce the build time, if the third party binaries rarely changes.