Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6150

[Python] Intermittent HDFS error

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Not A Problem
    • Affects Version/s: 0.14.1
    • Fix Version/s: None
    • Component/s: Python
    • Labels:
      None

      Description

      I'm running a Dask-YARN job that dumps a results dictionary into HDFS (code shown in traceback below) using PyArrow's HDFS IO library. However, the job intermittently runs into the error shown below, not every run, only sometimes. I'm unable to determine the root cause of this issue.

       

      {{ File "/extractor.py", line 87, in _call_ json.dump(results_dict, fp=UTF8Encoder(f), indent=4) File "pyarrow/io.pxi", line 72, in pyarrow.lib.NativeFile.exit_ File "pyarrow/io.pxi", line 130, in pyarrow.lib.NativeFile.close File "pyarrow/error.pxi", line 87, in pyarrow.lib.check_status pyarrow.lib.ArrowIOError: HDFS CloseFile failed, errno: 255 (Unknown error 255) Please check that you are connecting to the correct HDFS RPC port}}

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              sbajaj Saurabh Bajaj
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: