Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11388

[Python] Dataset Timezone Handling

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.0.0, 3.0.0
    • None
    • Python
    • None

    Description

      I'm trying to write a pandas dataframe with a datetimeindex with timezone information to a pyarrow dataset but the timezone information doesn't seem to be written (apart from in the pandas metadata)

       

      For example

       

      import os
      import pandas as pd
      import numpy as np
      import pyarrow as pa
      import pyarrow.parquet as pq
      import pyarrow.dataset as ds
      
      from pathlib import Path
      
      # I've tried with both v2.0 and v3.0 today
      print(pa.__version__)
      
      # create dummy dataframe with datetime index containing tz info
      df = pd.DataFrame(
          dict(
              timestamp=pd.date_range("2021-01-01", freq="1T", periods=100, tz="US/Eastern"),
              x=np.arange(100),
           )
      ).set_index("timestamp")
      
      test_dir = Path("test_dir")
      table = pa.Table.from_pandas(df)
      schema = table.schema
      
      print(schema)
      print(schema.pandas_metadata)
      
      # warning - creates dir in cwd
      pq.write_to_dataset(table, test_dir)
      
      # timestamp column is us and UTC
      print(pq.ParquetFile(test_dir / os.listdir(test_dir)[0]).read())
      
      # create dataset using schema from earlier
      dataset = ds.dataset(test_dir, format="parquet", schema=schema)
      
      # doesn't work
      dataset.to_table()
      

       

       

      Is this a bug or am I missing something?

      Thanks

      Andy

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            andydoug Andy Douglas
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: