[ARROW-12399] Unable to load libhdfs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.0.0
Fix Version/s: 3.0.0
Component/s: Python
Labels:
- filesystem
- hdfs

External issue URL:
https://github.com/apache/arrow/issues/18621

Description

I am using pyarrow 3.0.0 with python 3.7 and hadoop 2.10.1 on windows 10 64bit. Facing this following error.

I am using pyspark 3.1.1. I am not able to save dataframe to hdfs. When I used pyspark 3.0.0 I was able to save dataframe hdfs.

please help:

import pyarrow as pa
fs = pa.hdfs.connect(host='localhost', port=9001)
_main_:1: DeprecationWarning: pyarrow.hdfs.connect is deprecated as of 2.0.0, please use pyarrow.fs.HadoopFileSystem instead.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\1570513\Anaconda3\envs\on-premise-latest\lib\site-packages\pyarrow\hdfs.py", line 219, in connect
extra_conf=extra_conf
File "C:\Users\1570513\Anaconda3\envs\on-premise-latest\lib\site-packages\pyarrow\hdfs.py", line 229, in _connect
extra_conf=extra_conf)
File "C:\Users\1570513\Anaconda3\envs\on-premise-latest\lib\site-packages\pyarrow\hdfs.py", line 45, in _init_
self._connect(host, port, user, kerb_ticket, extra_conf)
File "pyarrow\io-hdfs.pxi", line 75, in pyarrow.lib.HadoopFileSystem._connect
File "pyarrow\error.pxi", line 99, in pyarrow.lib.check_status
OSError: Unable to load libhdfs: The specified module could not be found.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image-2021-04-15-20-04-50-069.png
15/Apr/21 14:34
38 kB
Sukesh Pabolu

Activity

People

Assignee:: Unassigned

Reporter:: Sukesh Pabolu

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 15/Apr/21 12:42

Updated:: 11/Jan/23 08:26