[ARROW-5236] [Python] hdfs.connect() is trying to load libjvm in windows - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: Python
Labels:
- hdfs
Environment:
Windows 7 Enterprise, pyarrow 0.13.0

External issue URL:
https://github.com/apache/arrow/issues/16719

Description

This issue was originally reported at https://github.com/apache/arrow/issues/4215 . Raising a Jira as per Wes McKinney's request.

Summary:
The following script

$ cat expt2.py
import pyarrow as pa
fs = pa.hdfs.connect()

tries to load libjvm in windows 7 which is not expected.

$ python ./expt2.py
Traceback (most recent call last):
  File "./expt2.py", line 3, in <module>
    fs = pa.hdfs.connect()
  File "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py", line 183, in connect
    extra_conf=extra_conf)
  File "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py", line 37, in __init__
    self._connect(host, port, user, kerb_ticket, driver, extra_conf)
  File "pyarrow\io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
  File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Unable to load libjvm

There is no libjvm file in Windows Java installation.

$ echo $JAVA_HOME
C:\Progra~1\Java\jdk1.8.0_141

$ find $JAVA_HOME -iname '*libjvm*'
<returns nothing.>

I see the libjvm error with both 0.11.1 and 0.13.0 versions of pyarrow.

Steps to reproduce the issue (with more details):

Create the environment

$ cat scratch_py36_pyarrow.yml
name: scratch_py36_pyarrow
channels:
  - defaults
dependencies:
  - python=3.6.8
  - pyarrow

$ conda env create -f scratch_py36_pyarrow.yml

Apply the following patch to lib/site-packages/pyarrow/hdfs.py . I had to do this since the Hadoop installation that comes with MapR <https://mapr.com/> windows client only has $HADOOP_HOME/bin/hadoop.cmd . There is no file named $HADOOP_HOME/bin/hadoop and so the subsequent subprocess.check_output call fails with FileNotFoundError if this patch is not applied.

$ cat ~/x/patch.txt
131c131
<         hadoop_bin = '{0}/bin/hadoop'.format(os.environ['HADOOP_HOME'])
---
>         hadoop_bin = '{0}/bin/hadoop.cmd'.format(os.environ['HADOOP_HOME'])

$ patch /c/ProgramData/Continuum/Anaconda/envs/scratch_py36_pyarrow/lib/site-packages/pyarrow/hdfs.py ~/x/patch.txt
patching file /c/ProgramData/Continuum/Anaconda/envs/scratch_py36_pyarrow/lib/site-packages/pyarrow/hdfs.py

Activate the environment

$ source activate scratch_py36_pyarrow

Sample script

$ cat expt2.py
import pyarrow as pa
fs = pa.hdfs.connect()

Execute the script

$ python ./expt2.py
Traceback (most recent call last):
  File "./expt2.py", line 3, in <module>
    fs = pa.hdfs.connect()
  File "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py", line 183, in connect
    extra_conf=extra_conf)
  File "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py", line 37, in __init__
    self._connect(host, port, user, kerb_ticket, driver, extra_conf)
  File "pyarrow\io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
  File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Unable to load libjvm

Attachments

Issue Links

is superceded by

ARROW-11642 [C++] Incorrect preprocessor directive for Windows in JVM detection

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Kamaraju

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 29/Apr/19 15:25

Updated:: 11/Jan/23 07:39

Resolved:: 22/Jun/21 15:11