Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9019

[Python] hdfs fails to connect to for HDFS 3.x cluster

Add voteWatch issue
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Python

    Description

      I'm trying to use the pyarrow hdfs connector with Hadoop 3.1.3 and I get an error that looks like a protobuf or jar mismatch problem with Hadoop. The same code works on a Hadoop 2.9 cluster.
       
      I'm wondering if there is something special I need to do or if pyarrow doesn't support Hadoop 3.x yet?
      Note I tried with pyarrow 0.15.1, 0.16.0, and 0.17.1.
       
          import pyarrow as pa
          hdfs_kwargs = dict(host="namenodehost",
                            port=9000,
                            user="tgraves",
                            driver='libhdfs',
                            kerb_ticket=None,
                            extra_conf=None)
          fs = pa.hdfs.connect(**hdfs_kwargs)
          res = fs.exists("/user/tgraves")
       
      Error that I get on Hadoop 3.x is:
       
      dfsExists: invokeMethod((Lorg/apache/hadoop/fs/Path;)Z) error:
      ClassCastException: org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto cannot be cast to org.apache.hadoop.shaded.com.google.protobuf.Messagejava.lang.ClassCastException: org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto cannot be cast to org.apache.hadoop.shaded.com.google.protobuf.Message
              at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
              at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
              at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
              at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:904)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:498)
              at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
              at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
              at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
              at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
              at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
              at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
              at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1661)
              at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1577)
              at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1574)
              at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
              at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1589)
              at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1683)

      Attachments

        Activity

          People

            Unassigned Unassigned
            tgraves Thomas Graves

            Dates

              Created:
              Updated:

              Slack

                Issue deployment