Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
8.0.0
Description
It looks like any attempt to read from S3 via pyarrow fails if access is supposed to be done via Assume Role while not passing an `external_id` to S3FileSystem.
In my understanding, `external_id` is an optional string to be passed to AWS API, however by setting `external_id=None` by default in init and then apply `tobytes()` to it later, it fails if external_id is None.
https://github.com/apache/arrow/blob/c72f84a48b4952796ab78a0c33b84a9fc8f893db/python/pyarrow/_s3fs.pyx#L230
This then leads to an exception like this:
(...) df = cursor.execute(query+';').as_pandas() File "/opt/conda/lib/python3.9/site-packages/pyathena/util.py", line 37, in _wrapper return wrapped(*args, **kwargs) File "/opt/conda/lib/python3.9/site-packages/pyathena/pandas/cursor.py", line 157, in execute self.result_set = AthenaPandasResultSet( File "/opt/conda/lib/python3.9/site-packages/pyathena/pandas/result_set.py", line 72, in __init__ self._fs = self.__s3_file_system() File "/opt/conda/lib/python3.9/site-packages/pyathena/pandas/result_set.py", line 86, in __s3_file_system fs = fs.S3FileSystem( File "pyarrow/_s3fs.pyx", line 217, in pyarrow._s3fs.S3FileSystem.__init__ File "stringsource", line 15, in string.from_py.__pyx_convert_string_from_py_std__in_string TypeError: expected bytes, NoneType found
This exception comes from using pyarrow with pyathena lib and their code does not pass any external_id.
Attachments
Issue Links
- links to