Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3932

virtualenv does not build binary python packages with toolchain

    Details

      Description

      The python virtualenv used by Impala installs a lot of packages, most of which are only python code. However, the sasl package (and maybe more) rely on C++ code which gets compiled at the time the package is installed by pip. Unfortunately the compilation does not use the Impala toolchain, so the compiled binaries reference the system libraries. On systems with a newer libstdc++ this is an issue when the sasl library is loaded.

      E.g. on Ubuntu 16, any python code that imports sasl fails. This means we cannot even start the mini cluster because the code to start the Hive server imports sasl.

       --> Starting Hive Server and Metastore Service
      Traceback (most recent call last):
        File "/data/impala-build/Impala/testdata/bin/wait-for-metastore.py", line 23, in <module>
          from tests.util.thrift_util import create_transport
        File "/data/impala-build/Impala/tests/util/thrift_util.py", line 20, in <module>
          import sasl
        File "build/bdist.linux-x86_64/egg/sasl/__init__.py", line 1, in <module>
        File "build/bdist.linux-x86_64/egg/sasl/saslwrapper.py", line 7, in <module>
        File "build/bdist.linux-x86_64/egg/_saslwrapper.py", line 7, in <module>
        File "build/bdist.linux-x86_64/egg/_saslwrapper.py", line 6, in __bootstrap__
      
      ImportError: /tmp/toolchain-build/native-toolchain/build/gcc-4.9.2/lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /root/.python-eggs/sasl-0.1.1-py2.7-linux-x86_64.egg-tmp/_saslwrapper.so)
      
      Error in /data/impala-build/Impala/testdata/bin/run-hive-server.sh at line 54: impala-python ${CLUSTER_BIN}/wait-for-metastore.py --transport=${METASTORE_TRANSPORT}
      
      Error in ./testdata/bin/run-all.sh at line 51: tee ${IMPALA_CLUSTER_LOGS_DIR}/run-hive-server.log
      

      We need to either:
      a) Figure out how to build python packages against our toolchain libraries. (This stackoverflow post may be helpful.)
      b) Avoid setting LD_LIBRARY_PATH for the entire environment (IMPALA-3926)

      I suspect (a) is worth doing now (we need this for Ubuntu 16), but (b) may be a better long term path.

        Issue Links

          Activity

          Hide
          Venkat Sambath Venkat Sambath added a comment -

          Thanks Tim for the workaround. Just adding below comment so that it will be easy to follow

          On Ubuntu 16.04 – usually the libstdc++.so.6 will be under /usr/lib/x86_64-linux-gnu/

          Once confirming the existence of libstdc++.so.6, you may run below command and the re-initiate the build of impala
          Example:
          export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH

          Show
          Venkat Sambath Venkat Sambath added a comment - Thanks Tim for the workaround. Just adding below comment so that it will be easy to follow On Ubuntu 16.04 – usually the libstdc++.so.6 will be under /usr/lib/x86_64-linux-gnu/ Once confirming the existence of libstdc++.so.6, you may run below command and the re-initiate the build of impala Example: export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH
          Hide
          tarmstrong Tim Armstrong added a comment -

          I think IMPALA-3926 will solve this in practice. The problem is that the toolchain libgcc ends up before the system libgcc on LD_LIBRARY_PATH.

          A workaround is to set LD_LIBRARY_PATH to point at the directory containing the system version. This works in my experience if the system version is newer than the toolchain version. This is not a great solution since we're then not testing with the toolchain library, but is a reasonable workaround for a local dev environment.

          Show
          tarmstrong Tim Armstrong added a comment - I think IMPALA-3926 will solve this in practice. The problem is that the toolchain libgcc ends up before the system libgcc on LD_LIBRARY_PATH. A workaround is to set LD_LIBRARY_PATH to point at the directory containing the system version. This works in my experience if the system version is newer than the toolchain version. This is not a great solution since we're then not testing with the toolchain library, but is a reasonable workaround for a local dev environment.
          Hide
          dknupp David Knupp added a comment -

          Setting PYTHONPATH if you're already working within a virtualenv seems bad.

          Also, FWIW, sasl is not within shell/ext-py on the Ubuntu 16 box:

          (env)root@impala-u16:/data/impala-build/Impala# ls -l shell/ext-py/
          total 0
          drwxr-xr-x 5 root root 218 Jul 29 11:19 prettytable-0.7.1
          drwxr-xr-x 9 root root 284 Jul 29 11:19 sqlparse-0.1.14
          
          Show
          dknupp David Knupp added a comment - Setting PYTHONPATH if you're already working within a virtualenv seems bad. Also, FWIW, sasl is not within shell/ext-py on the Ubuntu 16 box: (env)root@impala-u16:/data/impala-build/Impala# ls -l shell/ext-py/ total 0 drwxr-xr-x 5 root root 218 Jul 29 11:19 prettytable-0.7.1 drwxr-xr-x 9 root root 284 Jul 29 11:19 sqlparse-0.1.14
          Hide
          mjacobs Matthew Jacobs added a comment -

          Another thing that's really weird is that we have a separate version of the sasl python package in shell/ext-py/, which is version 0.1.1 and lives outside of the virtualenv, but still may be causing issues. It looks like we set PYTHONPATH to include the contents of that directory.

          Show
          mjacobs Matthew Jacobs added a comment - Another thing that's really weird is that we have a separate version of the sasl python package in shell/ext-py/ , which is version 0.1.1 and lives outside of the virtualenv, but still may be causing issues. It looks like we set PYTHONPATH to include the contents of that directory.
          Hide
          mjacobs Matthew Jacobs added a comment -

          Sounds good, thanks. Keep in mind that environment may be in bad shape.

          Show
          mjacobs Matthew Jacobs added a comment - Sounds good, thanks. Keep in mind that environment may be in bad shape.
          Hide
          dknupp David Knupp added a comment -

          Weirder than that, why is thrift listed as a sub-dependency of impyla?

          impyla == 0.11.2
            bitarray == 0.8.1
            sasl == 0.1.3
            six == 1.9.0
            # Thrift usually comes from the thirdparty dir but in case the virtualenv is needed
            # before thirdparty is built thrift will be installed anyways.
            thrift == 0.9.0
            thrift_sasl == 0.1.0
          

          Clearly that's not the case, right? (I don't much about impyla, but I do know that thrift seems to be required in other places – it's referenced a bunch of places throughout the modules within tests/.)

          I went ahead and installed thrift==0.9.0 in the virtualenv on Thomas' machine. When he's done with data loading, I'm going to try to restart the Hive server with the 'import sasl' line restored.

          Show
          dknupp David Knupp added a comment - Weirder than that, why is thrift listed as a sub-dependency of impyla? impyla == 0.11.2 bitarray == 0.8.1 sasl == 0.1.3 six == 1.9.0 # Thrift usually comes from the thirdparty dir but in case the virtualenv is needed # before thirdparty is built thrift will be installed anyways. thrift == 0.9.0 thrift_sasl == 0.1.0 Clearly that's not the case, right? (I don't much about impyla, but I do know that thrift seems to be required in other places – it's referenced a bunch of places throughout the modules within tests/.) I went ahead and installed thrift==0.9.0 in the virtualenv on Thomas' machine. When he's done with data loading, I'm going to try to restart the Hive server with the 'import sasl' line restored.
          Hide
          mjacobs Matthew Jacobs added a comment -

          Actually there's a weird comment:

            # Thrift usually comes from the thirdparty dir but in case the virtualenv is needed
            # before thirdparty is built thrift will be installed anyways.
            thrift == 0.9.0
          

          Not sure what that means or if it's still valid.

          Show
          mjacobs Matthew Jacobs added a comment - Actually there's a weird comment: # Thrift usually comes from the thirdparty dir but in case the virtualenv is needed # before thirdparty is built thrift will be installed anyways. thrift == 0.9.0 Not sure what that means or if it's still valid.
          Hide
          mjacobs Matthew Jacobs added a comment -

          Weird, thrift should be installed, infra/python/deps/requirements.txt includes
          thrift == 0.9.0. It's possible that environment is in a bad state at this point. It's been hacked up a lot.

          Show
          mjacobs Matthew Jacobs added a comment - Weird, thrift should be installed, infra/python/deps/requirements.txt includes thrift == 0.9.0 . It's possible that environment is in a bad state at this point. It's been hacked up a lot.
          Hide
          dknupp David Knupp added a comment -

          I'm not sure I'm making a point just yet – just doing some exploratory poking. (All of this is being done on the same Ubuntu 16 machine that Thomas is using.)

          Like, here's another weird thing. After I activate the Impala virtualenv and try to invoke the line that threw the error, I get a wholly different error:

          root@impala-u16:/data/impala-build/Impala# source infra/python/env/bin/activate
          (env)root@impala-u16:/data/impala-build/Impala# python
          Python 2.7.11+ (default, Apr 17 2016, 14:00:29)
          [GCC 5.3.1 20160413] on linux2
          Type "help", "copyright", "credits" or "license" for more information.
          >>> from tests.util.thrift_util import create_transport
          Traceback (most recent call last):
            File "<stdin>", line 1, in <module>
            File "tests/util/thrift_util.py", line 17, in <module>
              from thrift.transport.TSocket import TSocket
          ImportError: No module named thrift.transport.TSocket
          

          If I drop out of the the interpreter and check for which python packages are installed in the virtualenv, thrift is not even listed:

          (env)root@impala-u16:/data/impala-build/Impala# pip list
          You are using pip version 7.1.0, however version 8.1.2 is available.
          You should consider upgrading via the 'pip install --upgrade pip' command.
          AllPairs (2.0.1)
          apipkg (1.4)
          bitarray (0.8.1)
          boto3 (1.2.3)
          botocore (1.3.30)
          cm-api (10.0.0)
          Cython (0.23.4)
          docopt (0.6.2)
          docutils (0.12)
          ecdsa (0.13)
          execnet (1.4.0)
          Fabric (1.10.2)
          Flask (0.10.1)
          futures (3.0.5)
          hdfs (2.0.2)
          impyla (0.11.2)
          ipython (1.2.1)
          itsdangerous (0.24)
          Jinja2 (2.8)
          jmespath (0.9.0)
          kudu-python (0.1.1)
          linecache2 (1.0.0)
          MarkupSafe (0.23)
          monkeypatch (0.1rc3)
          numpy (1.10.4)
          ordereddict (1.1)
          paramiko (1.15.2)
          pbr (1.8.1)
          pexpect (3.3)
          pg8000 (1.10.2)
          pip (7.1.0)
          prettytable (0.7.2)
          psutil (0.7.1)
          py (1.4.30)
          pycrypto (2.6.1)
          pyelftools (0.23)
          pyparsing (2.0.3)
          pytest (2.7.2)
          pytest-random (0.2)
          pytest-xdist (1.12)
          python-dateutil (2.5.2)
          python-magic (0.4.11)
          pywebhdfs (0.3.2)
          requests (2.7.0)
          sasl (0.1.3)
          setuptools (18.0.1)
          sh (1.11)
          simplejson (3.3.0)
          six (1.9.0)
          sqlparse (0.1.15)
          texttable (0.8.3)
          thrift-sasl (0.1.0)
          traceback2 (1.4.0)
          unittest2 (1.1.0)
          Werkzeug (0.11.3)
          wheel (0.24.0)
          
          Show
          dknupp David Knupp added a comment - I'm not sure I'm making a point just yet – just doing some exploratory poking. (All of this is being done on the same Ubuntu 16 machine that Thomas is using.) Like, here's another weird thing. After I activate the Impala virtualenv and try to invoke the line that threw the error, I get a wholly different error: root@impala-u16:/data/impala-build/Impala# source infra/python/env/bin/activate (env)root@impala-u16:/data/impala-build/Impala# python Python 2.7.11+ ( default , Apr 17 2016, 14:00:29) [GCC 5.3.1 20160413] on linux2 Type "help" , "copyright" , "credits" or "license" for more information. >>> from tests.util.thrift_util import create_transport Traceback (most recent call last): File "<stdin>" , line 1, in <module> File "tests/util/thrift_util.py" , line 17, in <module> from thrift.transport.TSocket import TSocket ImportError: No module named thrift.transport.TSocket If I drop out of the the interpreter and check for which python packages are installed in the virtualenv, thrift is not even listed: (env)root@impala-u16:/data/impala-build/Impala# pip list You are using pip version 7.1.0, however version 8.1.2 is available. You should consider upgrading via the 'pip install --upgrade pip' command. AllPairs (2.0.1) apipkg (1.4) bitarray (0.8.1) boto3 (1.2.3) botocore (1.3.30) cm-api (10.0.0) Cython (0.23.4) docopt (0.6.2) docutils (0.12) ecdsa (0.13) execnet (1.4.0) Fabric (1.10.2) Flask (0.10.1) futures (3.0.5) hdfs (2.0.2) impyla (0.11.2) ipython (1.2.1) itsdangerous (0.24) Jinja2 (2.8) jmespath (0.9.0) kudu-python (0.1.1) linecache2 (1.0.0) MarkupSafe (0.23) monkeypatch (0.1rc3) numpy (1.10.4) ordereddict (1.1) paramiko (1.15.2) pbr (1.8.1) pexpect (3.3) pg8000 (1.10.2) pip (7.1.0) prettytable (0.7.2) psutil (0.7.1) py (1.4.30) pycrypto (2.6.1) pyelftools (0.23) pyparsing (2.0.3) pytest (2.7.2) pytest-random (0.2) pytest-xdist (1.12) python-dateutil (2.5.2) python-magic (0.4.11) pywebhdfs (0.3.2) requests (2.7.0) sasl (0.1.3) setuptools (18.0.1) sh (1.11) simplejson (3.3.0) six (1.9.0) sqlparse (0.1.15) texttable (0.8.3) thrift-sasl (0.1.0) traceback2 (1.4.0) unittest2 (1.1.0) Werkzeug (0.11.3) wheel (0.24.0)
          Hide
          mjacobs Matthew Jacobs added a comment -

          I think the same is true of the case where we have the issue on ubuntu 16. The mention of /root/.python-eggs/ is just confusing, I think it's just a per-user cache but we are still using the virtualenv sasl – it's just that that sasl is not compiled against the toolchain. Or perhaps I'm missing your point?

          Show
          mjacobs Matthew Jacobs added a comment - I think the same is true of the case where we have the issue on ubuntu 16. The mention of /root/.python-eggs/ is just confusing, I think it's just a per-user cache but we are still using the virtualenv sasl – it's just that that sasl is not compiled against the toolchain. Or perhaps I'm missing your point?
          Hide
          dknupp David Knupp added a comment -

          So weirdly enough, when I manually invoke the python binary within infra/python/env/bin (or source infra/python/env/bin/activate to invoke the virtualenv), I have no problem importing the sasl library in the interpreter:

          Python 2.7.11+ (default, Apr 17 2016, 14:00:29)
          [GCC 5.3.1 20160413] on linux2
          Type "help", "copyright", "credits" or "license" for more information.
          >>> import sasl
          >>> sasl
          <module 'sasl' from '/data/impala-build/Impala/infra/python/env/local/lib/python2.7/site-packages/sasl/__init__.pyc'>
          
          Show
          dknupp David Knupp added a comment - So weirdly enough, when I manually invoke the python binary within infra/python/env/bin (or source infra/python/env/bin/activate to invoke the virtualenv), I have no problem importing the sasl library in the interpreter: Python 2.7.11+ ( default , Apr 17 2016, 14:00:29) [GCC 5.3.1 20160413] on linux2 Type "help" , "copyright" , "credits" or "license" for more information. >>> import sasl >>> sasl <module 'sasl' from '/data/impala-build/Impala/infra/python/env/local/lib/python2.7/site-packages/sasl/__init__.pyc'>
          Hide
          mjacobs Matthew Jacobs added a comment -

          David Knupp, do you think you'd be able to investigate this? Lemme know if you wanna chat about it.

          Show
          mjacobs Matthew Jacobs added a comment - David Knupp , do you think you'd be able to investigate this? Lemme know if you wanna chat about it.

            People

            • Assignee:
              tarmstrong Tim Armstrong
              Reporter:
              mjacobs Matthew Jacobs
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development