Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10848

Provide compile-only option to skip downloading test dependencies

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Infrastructure
    • None
    • ghx-label-7

    Description

      Compiling Impala is not easy for a beginner. A portion of failures are in downloading/installing dependencies.

      For instance, old versions of Impala may fail to compile since cdh components of old GBNs on S3 are removed. However, the artifacts of cdh component are only used in testing (minicluster & holding testdata). We can still compile without them.

      Take pip dependencies as another example, here is a failure I got from a community user. It failed by installing pywebhdfs:

      However, simple git-grep shows that pywebhdfs is only used in tests:

      $ git grep pywebhdfs
      bin/bootstrap_system.sh:#  >>> from pywebhdfs.webhdfs import PyWebHdfsClient
      infra/python/deps/requirements.txt:pywebhdfs == 0.3.2
      tests/common/impala_test_suite.py:    #     HDFS: uses a mixture of pywebhdfs (which is faster than the HDFS CLI) and the
      tests/util/hdfs_util.py:from pywebhdfs.webhdfs import PyWebHdfsClient, errors, _raise_pywebhdfs_exception
      tests/util/hdfs_util.py:      _raise_pywebhdfs_exception(response.status_code, response.text)
      tests/util/hdfs_util.py:      _raise_pywebhdfs_exception(response.status_code, response.text)
      tests/util/hdfs_util.py:      _raise_pywebhdfs_exception(response.status_code, response.text)
      tests/util/hdfs_util.py:      _raise_pywebhdfs_exception(response.status_code, response.text) 

      If the user just wants to compile Impala and deploys it in their existing Hadoop cluster, dealing with these failures is a waste of their time.

      Target for this JIRA

      • Provide compile-only option to bin/bootstrap_system.sh. It should skip downloading/installing unused dependencies like postgresql.
      • Provide compile-only option to buildall.sh. It should skip downloading unused cdh/cdp components in compilation.
      • Update our wiki about this.

      Note that we already have some env vars to control the download behaviors, e.g. SKIP_PYTHON_DOWNLOAD, SKIP_TOOLCHAIN_BOOTSTRAP. We just need to make the compile-only scenario works with minimal requirements and document it.

      Attachments

        1. pywebhdfs_failure.png
          209 kB
          Quanlong Huang

        Activity

          People

            yx91490 XiangYang
            stigahuang Quanlong Huang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: