Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Not A Bug
-
3.0.0
-
None
-
None
Description
When writing a `uint32` column, (parquet's) logical type is not written, limiting interoperability with other engines.
Minimal Python
```
import pyarrow as pa
data =
{"uint32", [1, None, 0]}schema = pa.schema([pa.field('uint32', pa.uint32())])
t = pa.table(data, schema=schema)
pa.parquet.write_table(t, "bla.parquet")
```
Inspecting it with spark:
```
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.read.parquet("bla.parquet")
print(df.select("uint32").schema)
```
shows `StructType(List(StructField(uint32,LongType,true)))`. "LongType" indicates that the field is interpreted as a 64 bit integer. Further inspection of the metadata shows that both convertedType and logicalType are not being set. Note that this is independent of the arrow-specific schema written in the metadata.
Attachments
Issue Links
- is related to
-
ARROW-12203 [C++][Python] Switch default Parquet version to 2.4
- Resolved