Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-4943

pyarrow.lib.HadoopFileSystem._connect failed due to TypeError

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 0.12.1
    • None
    • Python
    • Kernel: 4.4.95.x86_64
      Python: 2.7.5

    Description

       

      When run https://github.com/uber/petastorm.git pytorch_hello_world.py script, it fails due to TypeError as following.

      It seems that the pyarrow.lib.HadoopFileSystem._connect require unicode argument, however, the argument input is aways a string type. So add a unicode() convert to make sure that the argument is a unicode type.

      Traceback (most recent call last):
      File "pytorch_hello_world.py", line 31, in <module>
      pytorch_hello_world()
      File "pytorch_hello_world.py", line 25, in pytorch_hello_world
      with DataLoader(make_reader(dataset_url)) as train_loader:
      File "/usr/lib/python2.7/site-packages/petastorm/reader.py", line 132, in make_reader
      resolver = FilesystemResolver(dataset_url, hdfs_driver=hdfs_driver)
      File "/usr/lib/python2.7/site-packages/petastorm/fs_utils.py", line 83, in _init_
      self._filesystem = connector.connect_to_either_namenode(namenodes)
      File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line 266, in connect_to_either_namenode
      return HAHdfsClient(cls, list_of_namenodes)
      File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line 224, in _init_
      self._do_connect()
      File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line 233, in _do_connect
      self._connector_cls._try_next_namenode(self._index_of_nn, self._list_of_namenodes)
      File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line 289, in _try_next_namenode
      cls.hdfs_connect_namenode(urlparse('hdfs://' + str(host or 'default')))
      File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line 250, in hdfs_connect_namenode
      return pyarrow.hdfs.connect(url.hostname or 'default', url.port or 8020, driver=driver)
      File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 209, in connect
      extra_conf=extra_conf)
      File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 39, in _init_
      self._connect(host, port, user, kerb_ticket, driver, extra_conf)
      File "pyarrow/io-hdfs.pxi", line 97, in pyarrow.lib.HadoopFileSystem._connect
      TypeError: Expected unicode, got str

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              vanderliang vanderliang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h