Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10409

Reduce total size of artifacts downloaded from S3 in building

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Infrastructure
    • None
    • ghx-label-12

    Description

      When building Impala, we need to download lots of dependencies.

      joemcdonnell helps to scrutinize where all the jars are coming from:

      Number of artifacts downloaded from each repo:
           16 cdh.rcs.releases.repo
         2067 central
          203 impala.cdp.repo
            2 impala.toolchain.kudu.repo 

      In my local env, the majority of the build time is spent in downloading artifacts from Cloudera's S3 bucket. There are some large files, e.g.

      458.2 MiB llvm-5.0.1-asserts-p3-gcc-7.5.0-ec2-package-ubuntu-16-04.tar.gz
      373.4 MiB llvm-5.0.1-p3-gcc-7.5.0-ec2-package-ubuntu-16-04.tar.gz
      1.1 GiB kudu-6a7cadc7e-gcc-7.5.0-ec2-package-ubuntu-16-04.tar.gz
      333.0 MiB apache-hive-3.1.3000.7.2.7.0-44-bin.tar.gz
      377.2 MiB hadoop-3.1.1.7.2.7.0-44.tar.gz
      370.4 MiB hbase-2.2.6.7.2.7.0-44-bin.tar.gz
      258.3 MiB ranger-2.1.0.7.2.7.0-44-admin.tar.gz
      63.4 MiB tez-0.9.1.7.2.7.0-44-minimal.tar.gz
      

      Downloading from S3 is super slow in China and maybe other places around the world. One solution is refactoring our dependencies to be on Apache released versions (IMPALA-10408) so we can download them from Apache mirrors.

      Another solution is providing alternative download sources like Alibaba Cloud or qcloud (Tencent Cloud). Developers can choose or setup their own sources.

      Attachments

        Activity

          People

            Unassigned Unassigned
            stigahuang Quanlong Huang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: