Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
None
-
None
-
Windows 7 Enterprise, pyarrow 0.13.0
Description
This issue was originally reported at https://github.com/apache/arrow/issues/4215 . Raising a Jira as per Wes McKinney's request.
Summary:
The following script
$ cat expt2.py
import pyarrow as pa
fs = pa.hdfs.connect()
tries to load libjvm in windows 7 which is not expected.
$ python ./expt2.py Traceback (most recent call last): File "./expt2.py", line 3, in <module> fs = pa.hdfs.connect() File "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py", line 183, in connect extra_conf=extra_conf) File "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py", line 37, in __init__ self._connect(host, port, user, kerb_ticket, driver, extra_conf) File "pyarrow\io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status pyarrow.lib.ArrowIOError: Unable to load libjvm
There is no libjvm file in Windows Java installation.
$ echo $JAVA_HOME C:\Progra~1\Java\jdk1.8.0_141 $ find $JAVA_HOME -iname '*libjvm*' <returns nothing.>
I see the libjvm error with both 0.11.1 and 0.13.0 versions of pyarrow.
Steps to reproduce the issue (with more details):
Create the environment
$ cat scratch_py36_pyarrow.yml name: scratch_py36_pyarrow channels: - defaults dependencies: - python=3.6.8 - pyarrow
$ conda env create -f scratch_py36_pyarrow.yml
Apply the following patch to lib/site-packages/pyarrow/hdfs.py . I had to do this since the Hadoop installation that comes with MapR <https://mapr.com/> windows client only has $HADOOP_HOME/bin/hadoop.cmd . There is no file named $HADOOP_HOME/bin/hadoop and so the subsequent subprocess.check_output call fails with FileNotFoundError if this patch is not applied.
$ cat ~/x/patch.txt 131c131 < hadoop_bin = '{0}/bin/hadoop'.format(os.environ['HADOOP_HOME']) --- > hadoop_bin = '{0}/bin/hadoop.cmd'.format(os.environ['HADOOP_HOME']) $ patch /c/ProgramData/Continuum/Anaconda/envs/scratch_py36_pyarrow/lib/site-packages/pyarrow/hdfs.py ~/x/patch.txt patching file /c/ProgramData/Continuum/Anaconda/envs/scratch_py36_pyarrow/lib/site-packages/pyarrow/hdfs.py
Activate the environment
$ source activate scratch_py36_pyarrow
Sample script
$ cat expt2.py import pyarrow as pa fs = pa.hdfs.connect()
Execute the script
$ python ./expt2.py Traceback (most recent call last): File "./expt2.py", line 3, in <module> fs = pa.hdfs.connect() File "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py", line 183, in connect extra_conf=extra_conf) File "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py", line 37, in __init__ self._connect(host, port, user, kerb_ticket, driver, extra_conf) File "pyarrow\io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status pyarrow.lib.ArrowIOError: Unable to load libjvm
Attachments
Issue Links
- is superceded by
-
ARROW-11642 [C++] Incorrect preprocessor directive for Windows in JVM detection
- Resolved