Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5236

[Python] hdfs.connect() is trying to load libjvm in windows

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • Python
    • Windows 7 Enterprise, pyarrow 0.13.0

    Description

      This issue was originally reported at https://github.com/apache/arrow/issues/4215 . Raising a Jira as per Wes McKinney's request.

      Summary:
      The following script

      $ cat expt2.py
      import pyarrow as pa
      fs = pa.hdfs.connect()
      

      tries to load libjvm in windows 7 which is not expected.

      $ python ./expt2.py
      Traceback (most recent call last):
        File "./expt2.py", line 3, in <module>
          fs = pa.hdfs.connect()
        File "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py", line 183, in connect
          extra_conf=extra_conf)
        File "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py", line 37, in __init__
          self._connect(host, port, user, kerb_ticket, driver, extra_conf)
        File "pyarrow\io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
        File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status
      pyarrow.lib.ArrowIOError: Unable to load libjvm
      

      There is no libjvm file in Windows Java installation.

      $ echo $JAVA_HOME
      C:\Progra~1\Java\jdk1.8.0_141
      
      $ find $JAVA_HOME -iname '*libjvm*'
      <returns nothing.>
      

      I see the libjvm error with both 0.11.1 and 0.13.0 versions of pyarrow.

      Steps to reproduce the issue (with more details):

      Create the environment

      $ cat scratch_py36_pyarrow.yml
      name: scratch_py36_pyarrow
      channels:
        - defaults
      dependencies:
        - python=3.6.8
        - pyarrow
      
      $ conda env create -f scratch_py36_pyarrow.yml
      

      Apply the following patch to lib/site-packages/pyarrow/hdfs.py . I had to do this since the Hadoop installation that comes with MapR <https://mapr.com/> windows client only has $HADOOP_HOME/bin/hadoop.cmd . There is no file named $HADOOP_HOME/bin/hadoop and so the subsequent subprocess.check_output call fails with FileNotFoundError if this patch is not applied.

      $ cat ~/x/patch.txt
      131c131
      <         hadoop_bin = '{0}/bin/hadoop'.format(os.environ['HADOOP_HOME'])
      ---
      >         hadoop_bin = '{0}/bin/hadoop.cmd'.format(os.environ['HADOOP_HOME'])
      
      $ patch /c/ProgramData/Continuum/Anaconda/envs/scratch_py36_pyarrow/lib/site-packages/pyarrow/hdfs.py ~/x/patch.txt
      patching file /c/ProgramData/Continuum/Anaconda/envs/scratch_py36_pyarrow/lib/site-packages/pyarrow/hdfs.py
      

      Activate the environment

      $ source activate scratch_py36_pyarrow
      

      Sample script

      $ cat expt2.py
      import pyarrow as pa
      fs = pa.hdfs.connect()
      

      Execute the script

      $ python ./expt2.py
      Traceback (most recent call last):
        File "./expt2.py", line 3, in <module>
          fs = pa.hdfs.connect()
        File "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py", line 183, in connect
          extra_conf=extra_conf)
        File "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py", line 37, in __init__
          self._connect(host, port, user, kerb_ticket, driver, extra_conf)
        File "pyarrow\io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
        File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status
      pyarrow.lib.ArrowIOError: Unable to load libjvm
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kamaraju Kamaraju
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: