Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8988

[Python] After upgrade pyarrow from 0.15 to 0.17.1 connect to hdfs don`t work with libdfs jni

    XMLWordPrintableJSON

Details

    Description

      Problem

      After upgrade pyarrow from 0.15 to 0.17, I have a some troubles. I understand, that libhdfs3 no support now. However, in my case, libhdfs not work too. See below.

      My experience in the Hadoop ecosystem is not big. Maybe, I took a some wrongs. I installed Hortonworks HDP  from Ambari service on the virtual machine, installed on my PC.

      I try that..

      1.  just connect..

      %xmode Verbose
      import pyarrow as pa

      hdfs = pa.hdfs.connect(host='hdp.test.com', port=8020, user='hdfs')

      FileNotFoundError: [Errno 2] No such file or directory: 'hadoop': 'hadoop' (1.txt)

      2. to bypass if driver == 'libhdfs'..

      %xmode Verbose

      import pyarrow as pa

      hdfs = pa.HadoopFileSystem(host='hdp.test.com', port=8020, user='hdfs', driver=None')

      OSError: Unable to load libjvm: /usr/java/latest//lib/server/libjvm.so: cannot open shared object file: No such file or directory (2.txt)

      3. With libhdfs3 it working:

      import hdfs3 

      hdfs = hdfs3.HDFileSystem(host='hdp.test.com', port=8020, user='hdfs')

      #ls remote folder
      hdfs.ls('/data/', detail=False)

      ['/data/TimeSheet.2020-04-11', '/data/test', '/data/test.json']

      Environment.

      Client PC:

      OS: Debian 10. Dev.: Anaconda3 (python 3.7.6), Jupyter Lab 2, pyarrow 0.17.1 (from conda-forge)

      Hadoop (on VM – Oracle VirtualBox):

      OS: Oracle Linux 7.6.  Distr.: Hortonworks HDP 3.1.4

      libhdfs.so:

      [root@hdp /]# find / -name libhdfs.so
      /usr/lib/ams-hbase/lib/hadoop-native/libhdfs.so
      /usr/hdp/3.1.4.0-315/usr/lib/libhdfs.so

       

       Java path:

      [root@hdp /]# sudo alternatives --config java

      -----------------------------------------------
      *+ 1           java-1.8.0-openjdk.x86_64 (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/bin/java)

       

      libjvm:               

      [root@hdp /]# find / -name libjvm.*
      /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/lib/amd64/server/libjvm.so
      /usr/jdk64/jdk1.8.0_112/jre/lib/amd64/server/libjvm.so

       

      I tried many settings (. Below last :

      1. etc/profile.
        ...
        export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))
        export JRE_HOME=$JAVA_HOME/jre
        export JAVA_CLASSPATH=$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar
        export HADOOP_HOME=/usr/hdp/3.1.4.0-315/hadoop
        export HADOOP_CLASSPATH=$(find $HADOOP_HOME -name '*.jar' | xargs echo | tr ' ' ':')
        export ARROW_LIBHDFS_DIR=/usr/lib/ams-hbase/lib/hadoop-native

      export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
      export CLASSPATH==.:$CLASSPATH:$JAVA_CLASSPATH:$HADOOP_CLASSPATH

      export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JRE_HOME/lib/amd64/server

       
       

      Attachments

        1. 2.txt
          2 kB
          Pavel Dourugyan
        2. 1.txt
          5 kB
          Pavel Dourugyan

        Activity

          People

            Unassigned Unassigned
            pavel_durugyan Pavel Dourugyan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: