Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.0
Description
I am using the Fileystem abstraction to write out html / text files to the local filesystem as well as s3.
I noticed that when using s3_fs.open_output_stream in combination with file.write(bytes), the object that gets created has a Content-Type of 'application/xml' even tough it's plain text, which is problematic for me.
Here is a minimal example:
import boto3 BUCKET = "my-bucket" path = f"s3://{BUCKET}/pyarrow_encoding.txt" s3_fs, output_path = FileSystem.from_uri(path) with s3_fs.open_output_stream(path=output_path, compression=None) as f: f.write('hello'.encode('UTF-8')) s3 = boto3.client('s3') response = s3.get_object(Bucket=BUCKET, Key='pyarrow_encoding.txt') print(response['ContentType']) # Output: application/xml print(response['Body'].read().decode('UTF-8')) # Output: hello s3.put_object(Bucket=BUCKET, Key='boto3_encoding.txt', Body='hello'.encode('UTF-8')) response = s3.get_object(Bucket=BUCKET, Key='boto3_encoding.txt') print(response['ContentType']) # Output: binary/octet-stream print(response['Body'].read().decode('UTF-8')) # Output: hello
I know, that the S3Filesystem implementation of pyarrow might no have mime type inference implemented, but I am wondering, why always 'application/xml' is the resulting Content-Type? Maybe this is hardcoded somewhere?
Originally, I tried this with '.html' files and also there, the objects on s3 always got the 'application/xml' Content-Type. (Please also see attachment from the s3 console)
Any help or pointer is appreciated.
Thank you,
Nicolas
Attachments
Attachments
Issue Links
- links to