Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4635

Reduce bootstrap time for Python virtualenv

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Infrastructure
    • Labels:
      None

      Description

      bootstrap_virtualenv.py can take a long time to run the first time because it compiles cython and kudu-python from scratch with a single core. Subsequent runs are faster because pip caches the results (although this has other potential downsides, like hard-to-reproduce errors is something bad is cached).

      We should figure out a way to speed this up.

        Issue Links

          Activity

          Hide
          tarmstrong Tim Armstrong added a comment -

          IMPALA-4593,IMPALA-4635: fix some python build issues

          Build C/C++ packages with toolchain GCC to avoid ABI compatibility
          issues. This requires a multi-step bootstrapping process:
          1. install basic non-C/C++ packages into the virtualenv
          2. use Python 2.7 from the virtualenv to bootstrap the toolchain
          3. use toolchain gcc to build C/C++ packages
          4. build the kudu-python package with toolchain gcc and Cython

          To avoid potentially pulling in cached versions of packages
          built with a different compiler, this patch also disables pip's
          caching. This should not have a significant effect on performance
          since we've enabled ccache and cache downloaded packages in
          infra/python/deps.

          Improve bootstrapping time significantly by using ccache and by
          parallelising the numpy build - the most expensive part of the
          install process. On a system with a warmed-up ccache,
          bootstrapping after deleting infra/python/env takes 1m16s. Previously
          it could take over 5m.

          Testing:
          Tested manually on Ubuntu 16.04 to confirm that it fixes the ABI
          problem mentioned in IMPALA-4593. Initially "import kudu" failed
          in my dev environment. After deleting infra/python/env and
          re-bootstrapping, "import kudu" succeeded.

          Also ran the standard test suite on CentOS 6 and built Impala on
          a range of platforms (CentOS 5,6,7; SLES 11,12; Debian 6,7;
          Ubuntu12.04,14.04,16.04) to make sure nothing broke.

          Change-Id: I9e807510eddeb354069e0478363f649a1c1b75cf
          Reviewed-on: http://gerrit.cloudera.org:8080/6218
          Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
          Tested-by: Impala Public Jenkins

          Show
          tarmstrong Tim Armstrong added a comment - IMPALA-4593 , IMPALA-4635 : fix some python build issues Build C/C++ packages with toolchain GCC to avoid ABI compatibility issues. This requires a multi-step bootstrapping process: 1. install basic non-C/C++ packages into the virtualenv 2. use Python 2.7 from the virtualenv to bootstrap the toolchain 3. use toolchain gcc to build C/C++ packages 4. build the kudu-python package with toolchain gcc and Cython To avoid potentially pulling in cached versions of packages built with a different compiler, this patch also disables pip's caching. This should not have a significant effect on performance since we've enabled ccache and cache downloaded packages in infra/python/deps. Improve bootstrapping time significantly by using ccache and by parallelising the numpy build - the most expensive part of the install process. On a system with a warmed-up ccache, bootstrapping after deleting infra/python/env takes 1m16s. Previously it could take over 5m. Testing: Tested manually on Ubuntu 16.04 to confirm that it fixes the ABI problem mentioned in IMPALA-4593 . Initially "import kudu" failed in my dev environment. After deleting infra/python/env and re-bootstrapping, "import kudu" succeeded. Also ran the standard test suite on CentOS 6 and built Impala on a range of platforms (CentOS 5,6,7; SLES 11,12; Debian 6,7; Ubuntu12.04,14.04,16.04) to make sure nothing broke. Change-Id: I9e807510eddeb354069e0478363f649a1c1b75cf Reviewed-on: http://gerrit.cloudera.org:8080/6218 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins

            People

            • Assignee:
              tarmstrong Tim Armstrong
              Reporter:
              tarmstrong Tim Armstrong
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development