Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5888

[Python][C++] Add metadata to store Arrow time zones in Parquet file metadata

    XMLWordPrintableJSON

    Details

      Description

      The timezone is not roundtrip safe for timezones other than UTC when storing to parquet. Expected behavior would be that the timezone is properly reconstructed

      schema = pa.schema(
          [
              pa.field("no_tz", pa.timestamp('us')),
              pa.field("no_tz", pa.timestamp('us', tz="UTC")),
              pa.field("no_tz", pa.timestamp('us', tz="Europe/Berlin")),
      ]
      )
      buf = pa.BufferOutputStream()
      pq.write_metadata(
          schema,
          buf,
          coerce_timestamps="us"
      )
      
      pq_bytes = buf.getvalue().to_pybytes()
      reader = pa.BufferReader(pq_bytes)
      parquet_file = pq.ParquetFile(reader)
      parquet_file.schema.to_arrow_schema()
      # Output:
      # no_tz: timestamp[us]
      # utc: timestamp[us, tz=UTC]
      # europe: timestamp[us, tz=UTC]
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                wesmckinn Wes McKinney
                Reporter:
                fjetter Florian Jetter
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m