Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
0.16.0
-
None
Description
In pyarrow 0.15.x, HDFS filesystem works as follows:
If you set HADOOP_HOME env var, it looks for libhdfs.so in $HADOOP_HOME/lib/native.
In pyarrow 0.16.x, if you set HADOOP_HOME, it looks for libhdfs.so in $HADOOP_HOME, which is incorrect behaviour on all systems I am using.
Also, CLASSPATH no longer gets set automatically, which is very convenient. The issue here is that I need to set hadoop home correctly to be able to use other libraries, but have to reset it to use apache arrow. e.g.
os.environ["HADOOP_HOME"] = "/usr/lib/hadoop"
..do stuff here..
...then connect to arrow...
os.environ["HADOOP_HOME"] = "/usr/lib/hadoop/lib/native"
hdfs = pyarrow.hdfs.connect(host, port)
...then reset my hadoop home...
os.environ["HADOOP_HOME"] = "/usr/lib/hadoop"
etc.
Example:
>>> os.environ["HADOOP_HOME"] = "/usr/lib/hadoop"
>>> hdfs = pyarrow.hdfs.connect(host, port)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/.conda/envs/retroscoring/lib/python3.6/site-packages/pyarrow/hdfs.py", line 215, in connect
extra_conf=extra_conf)
File "/home/user/.conda/envs/retroscoring/lib/python3.6/site-packages/pyarrow/hdfs.py", line 40, in _init_
self._connect(host, port, user, kerb_ticket, driver, extra_conf)
File "pyarrow/io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
OSError: Unable to load libhdfs: /usr/lib/hadoop/libhdfs.so: cannot open shared object file: No such file or directory
Attachments
Issue Links
- is duplicated by
-
ARROW-7841 [C++] HADOOP_HOME doesn't work to find libhdfs.so
- Resolved