Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
In the python library, if an instance of `HadoopFileSystem` is garbage collected, all other existing instances become invalid. I haven't checked with a C++ only example, but from reading the cython code I can't see how cython is responsible, so I think this is a bug in the C++ library.
>>> import pyarrow as pa >>> h = pa.hdfs.connect() 18/01/24 16:54:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/01/24 16:54:26 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. >>> h.ls("/") ['/benchmarks', '/hbase', '/tmp', '/user', '/var'] >>> h2 = pa.hdfs.connect() >>> del h # close one client >>> h2.ls("/") # all filesystem operations now fail hdfsListDirectory(/): FileSystem#listStatus error: IOException: Filesystem closedjava.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:865) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2106) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2092) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:743) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:113) at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:808) at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:804) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:804) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/conda/lib/python3.6/site-packages/pyarrow/hdfs.py", line 88, in ls return super(HadoopFileSystem, self).ls(path, detail) File "io-hdfs.pxi", line 248, in pyarrow.lib.HadoopFileSystem.ls File "error.pxi", line 79, in pyarrow.lib.check_status pyarrow.lib.ArrowIOError: HDFS: list directory failed >>> h2.is_open # The python object still thinks it's open True
Attachments
Issue Links
- links to