Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 3.4.0
-
None
-
ghx-label-3
Description
[Note: this JIRA was filed in relation to the ongoing effort to make the impala-shell compatible with python 3]
The impala python development environment is a fairly convoluted affair – a number of packages are installed in the infra/python/env, some of it comes from the toolchain, some of it is generated and lives in the shell directory. Generally speaking, if you launch impala-python and import a module, it's not necessarily easy to predict where the module might live.
$ python Python 2.7.10 (default, Aug 17 2018, 19:45:58) [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import sasl >>> sasl <module 'sasl' from '/home/systest/Impala/shell/ext-py/sasl-0.1.1/dist/sasl-0.1.1-py2.7-linux-x86_64.egg/sasl/__init__.pyc'> >>> import requests >>> requests <module 'requests' from '/home/systest/Impala/infra/python/env/local/lib/python2.7/site-packages/requests/__init__.pyc'> >>> import Logging >>> Logging <module 'Logging' from '/home/systest/Impala/shell/gen-py/Logging/__init__.pyc'> >>> import thrift >>> thrift <module 'thrift' from '/home/systest/Impala/toolchain/thrift-0.9.3-p7/python/lib/python2.7/site-packages/thrift/__init__.pyc'>
Really, there is no one coherent environment – there's just whatever collection of modules happens to be available at a given time for a given type of invocation, all of which is accomplished behind the scenes by calling scripts like bin/set-pythonpath.sh and bin/impala-python-common.sh that are responsible for cobbling together a PYTHONPATH based on known locations and current env variables.
As far as I can tell, there are three important contexts where python comes into play...
- during the build process (used during data load, e.g., testdata/bin/load_nested.py)
- when running the py.test bases e2e tests
- whenever the impala-shell is invoked
As noted by IMPALA-7825 (and also in a conversation I had with Sahil Takiar), we're dependent on thrift 0.9.3 during the build process. This seems to come into play during the loading of test data (specifically, when calling testdata/bin/load_nested.py) mainly because at one point there was some well-intentioned but probably misguided attempt at code reuse from the test framework. The test code that gets re-used involves impyla and/or thrift-sasl, which currently still relies on thrift 0.9.3. So our test framework, and by extension the build, both inherit the same limitation.
The impala-shell, on the other hand, luckily doesn't directly reuse any of the same test modules, and there really is no need to keep it pinned to 0.9.3. However, since calling the impala-shell.sh winds up invoking set-pythonpath.sh, the same script that script sets up the environment during building or testing, thrift 0.9.3 just kind of leaks over by default.
As it turns out, thrift 0.9.3 is also one of the many limitations restricting the impala-shell to python 2. Luckily, with IMPALA-7924 resolved, thrift-0.11.0 is available – we just have to use it. And the way to accomplish that is by decoupling the impala-shell from relying either set-pythonpath.sh or impala-python-common.sh.
As a first pass, we can address the dev environment by just having impala-shell.sh itself do whatever is required to find python dependencies, and we can specify thrift-0.11.0 there. Also, thrift 0.11.0 should be used by both of the scripts used to create the tarballs that package the impala-shell for customer environments. Neither of these should adversely building Impala or running the py.test test framework.
Attachments
Attachments
Issue Links
- causes
-
IMPALA-10434 impala-shell crash in parsing multiline queries that contain UTF-8 characters
- Resolved
- incorporates
-
IMPALA-8908 Bad error message when failing to connect to HTTPS endpoint with shell
- Resolved
- is related to
-
IMPALA-3343 Impala-shell compatibility with python 3
- Resolved