Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12201

[C++] [Parquet] Writing uint32 does not preserve parquet's LogicalType

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Bug
    • 3.0.0
    • None
    • C++, Parquet
    • None

    Description

      When writing a `uint32` column, (parquet's) logical type is not written, limiting interoperability with other engines.

      Minimal Python

      ```
      import pyarrow as pa

      data =

      {"uint32", [1, None, 0]}

      schema = pa.schema([pa.field('uint32', pa.uint32())])

      t = pa.table(data, schema=schema)
      pa.parquet.write_table(t, "bla.parquet")
      ```
       
      Inspecting it with spark:

      ```
      from pyspark.sql import SparkSession

      spark = SparkSession.builder.getOrCreate()

      df = spark.read.parquet("bla.parquet")
      print(df.select("uint32").schema)
      ```

      shows `StructType(List(StructField(uint32,LongType,true)))`. "LongType" indicates that the field is interpreted as a 64 bit integer. Further inspection of the metadata shows that both convertedType and logicalType are not being set. Note that this is independent of the arrow-specific schema written in the metadata.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jorgecarleitao Jorge Leitão
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: