Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8432

[Python][CI] Failure to download Hadoop

    XMLWordPrintableJSON

    Details

      Description

      https://circleci.com/gh/ursa-labs/crossbow/11128?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

      This is caused by an HTTP request failure https://github.com/apache/arrow/blob/master/ci/docker/conda-python-hdfs.dockerfile#L36

      We should probably not rely on https://www.apache.org/dyn/mirrors/mirrors.cgi to get tarballs. Currently there are:

      ci/docker/conda-python-hdfs.dockerfile
      36:RUN wget -q -O - "https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download&filename=hadoop/common/hadoop-${hdfs}/hadoop-${hdfs}.tar.gz" | tar -xzf - -C /opt
      
      ci/docker/linux-apt-docs.dockerfile
      57:RUN wget -q -O - "https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download&filename=maven/maven-3/${maven}/binaries/apache-maven-${maven}-bin.tar.gz" | tar -xzf - -C /opt
      
      python/manylinux1/scripts/build_thrift.sh
      22:  "https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download&filename=${THRIFT_DOWNLOAD_PATH}" \
      
      python/manylinux201x/scripts/build_thrift.sh
      20:wget https://archive.apache.org/dist/thrift/${THRIFT_VERSION}/thrift-${THRIFT_VERSION}.tar.gz
      

      Factor these out into a reusable script for downloading apache tarballs. It should contain hard coded apache mirrors and retry when connections fail

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bkietz Ben Kietzman
                Reporter:
                bkietz Ben Kietzman
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4h
                  4h