Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6044

[Python] Pyarrow HDFS client gets hung after a while

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Feedback Received
    • Affects Version/s: 0.13.0
    • Fix Version/s: None
    • Component/s: Python
    • Labels:
    • Environment:
      hadoop-3.0.3
      driver='libhdfs'
      python 3.6
      Centos7

      Description

      I'm using the pyarrow HDFS client in a long running (forever) app that makes connections to HDFS as external requests come in and destroys the connection as soon as the request is handled. This happens a large amount of times on separate threads and everything works great.

      The problem is, after the app idles for a while (perhaps hours) and no HDFS connections are made during this time, when the next connection is attempted, the API hdfs.connect(...) just hangs. No exceptions are thrown.

      Code snippet on what i'm doing to instantiate each connection:

      ...

      hdfs = pyarrow.hdfs.connect(self.hdfs_authority, self.hdfs_port, user=self.hdfs_user)

      try:

      //Do something

      finally:

      hdfs.close

       

      Any help on what might be causing these hangs is appreciated

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              ftzeng82 Fred Tzeng
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: