Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5922

[Python] Unable to connect to HDFS from a worker/data node on a Kerberized cluster using pyarrow' hdfs API

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Works for Me
    • Affects Version/s: 0.14.0
    • Fix Version/s: 0.14.0
    • Component/s: Python
    • Labels:
      None
    • Environment:
      Unix

      Description

      Here's what I'm trying:

      ```

      {{import pyarrow as pa }}

      {{conf = {"hadoop.security.authentication": "kerberos"} }}

      fs = pa.hdfs.connect(kerb_ticket="/tmp/krb5cc_44444", extra_conf=conf)

      ```

      However, when I submit this job to the cluster using Dask-YARN, I get the following error:

      ```

      File "test/run.py", line 3 fs = pa.hdfs.connect(kerb_ticket="/tmp/krb5cc_44444", extra_conf=conf) File "/opt/hadoop/data/10/hadoop/yarn/local/usercache/hdfsf6/appcache/application_1560931326013_183242/container_e47_1560931326013_183242_01_000003/environment/lib/python3.7/site-packages/pyarrow/hdfs.py", line 211, in connect File "/opt/hadoop/data/10/hadoop/yarn/local/usercache/hdfsf6/appcache/application_1560931326013_183242/container_e47_1560931326013_183242_01_000003/environment/lib/python3.7/site-packages/pyarrow/hdfs.py", line 38, in _init_ File "pyarrow/io-hdfs.pxi", line 105, in pyarrow.lib.HadoopFileSystem._connect File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status pyarrow.lib.ArrowIOError: HDFS connection failed

      ```

      I also tried setting host (to a name node) and port (=8020), however I run into the same error. Since the error is not descriptive, I'm not sure which setting needs to be altered. Any clues anyone?

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              sbajaj Saurabh Bajaj
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: