Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8154

[Python] HDFS Filesystem does not set environment variables in pyarrow 0.16.0 release

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 0.16.0
    • 0.17.0
    • Python
    • None

    Description

      In pyarrow 0.15.x, HDFS filesystem works as follows:

      If you set HADOOP_HOME env var, it looks for libhdfs.so in $HADOOP_HOME/lib/native.

      In pyarrow 0.16.x, if you set HADOOP_HOME, it looks for libhdfs.so in $HADOOP_HOME, which is incorrect behaviour on all systems I am using.

      Also, CLASSPATH no longer gets set automatically, which is very convenient. The issue here is that I need to set hadoop home correctly to be able to use other libraries, but have to reset it to use apache arrow. e.g.

      os.environ["HADOOP_HOME"] = "/usr/lib/hadoop"

      ..do stuff here..

      ...then connect to arrow...

      os.environ["HADOOP_HOME"] = "/usr/lib/hadoop/lib/native"

      hdfs = pyarrow.hdfs.connect(host, port)

      ...then reset my hadoop home...

      os.environ["HADOOP_HOME"] = "/usr/lib/hadoop"

      etc.

       

      Example:

      >>> os.environ["HADOOP_HOME"] = "/usr/lib/hadoop"

      >>> hdfs = pyarrow.hdfs.connect(host, port)

      Traceback (most recent call last):

        File "<stdin>", line 1, in <module>

        File "/home/user/.conda/envs/retroscoring/lib/python3.6/site-packages/pyarrow/hdfs.py", line 215, in connect

          extra_conf=extra_conf)

        File "/home/user/.conda/envs/retroscoring/lib/python3.6/site-packages/pyarrow/hdfs.py", line 40, in _init_

          self._connect(host, port, user, kerb_ticket, driver, extra_conf)

        File "pyarrow/io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect

        File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status

      OSError: Unable to load libhdfs: /usr/lib/hadoop/libhdfs.so: cannot open shared object file: No such file or directory

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              l33tn00b Eric Henry
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: