Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.17.1
-
None
Description
Problem
After upgrade pyarrow from 0.15 to 0.17, I have a some troubles. I understand, that libhdfs3 no support now. However, in my case, libhdfs not work too. See below.
My experience in the Hadoop ecosystem is not big. Maybe, I took a some wrongs. I installed Hortonworks HDP from Ambari service on the virtual machine, installed on my PC.
I try that..
1. just connect..
%xmode Verbose
import pyarrow as pa
hdfs = pa.hdfs.connect(host='hdp.test.com', port=8020, user='hdfs')
—
FileNotFoundError: [Errno 2] No such file or directory: 'hadoop': 'hadoop' (1.txt)
2. to bypass if driver == 'libhdfs'..
%xmode Verbose
import pyarrow as pa
hdfs = pa.HadoopFileSystem(host='hdp.test.com', port=8020, user='hdfs', driver=None')
—
OSError: Unable to load libjvm: /usr/java/latest//lib/server/libjvm.so: cannot open shared object file: No such file or directory (2.txt)
3. With libhdfs3 it working:
import hdfs3
hdfs = hdfs3.HDFileSystem(host='hdp.test.com', port=8020, user='hdfs')
#ls remote folder
hdfs.ls('/data/', detail=False)
['/data/TimeSheet.2020-04-11', '/data/test', '/data/test.json']
Environment.
Client PC:
OS: Debian 10. Dev.: Anaconda3 (python 3.7.6), Jupyter Lab 2, pyarrow 0.17.1 (from conda-forge)
Hadoop (on VM – Oracle VirtualBox):
OS: Oracle Linux 7.6. Distr.: Hortonworks HDP 3.1.4
libhdfs.so:
[root@hdp /]# find / -name libhdfs.so
/usr/lib/ams-hbase/lib/hadoop-native/libhdfs.so
/usr/hdp/3.1.4.0-315/usr/lib/libhdfs.so
Java path:
[root@hdp /]# sudo alternatives --config java
-----------------------------------------------
*+ 1 java-1.8.0-openjdk.x86_64 (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/bin/java)
libjvm:
[root@hdp /]# find / -name libjvm.*
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/lib/amd64/server/libjvm.so
/usr/jdk64/jdk1.8.0_112/jre/lib/amd64/server/libjvm.so
I tried many settings (. Below last :
- etc/profile.
...
export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))
export JRE_HOME=$JAVA_HOME/jre
export JAVA_CLASSPATH=$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=/usr/hdp/3.1.4.0-315/hadoop
export HADOOP_CLASSPATH=$(find $HADOOP_HOME -name '*.jar' | xargs echo | tr ' ' ':')
export ARROW_LIBHDFS_DIR=/usr/lib/ams-hbase/lib/hadoop-native
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export CLASSPATH==.:$CLASSPATH:$JAVA_CLASSPATH:$HADOOP_CLASSPATH
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JRE_HOME/lib/amd64/server