Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1435

[Python] PyArrow not propagating timezone information from Parquet to Python

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.6.0
    • 0.7.0
    • Python
    • None

    Description

      PyArrow reads timezone metadata for Timestamp values from Parquet. This information isn't propagated through to the resulting python datetime object though.

      λ python
      Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)] on win32
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import pyarrow as pa
      >>> import pyarrow.parquet as pq
      >>> import pytz
      >>> import pandas
      >>> from datetime import datetime
      >>>
      >>> d1 = datetime.strptime('2015-07-05 23:50:00', '%Y-%m-%d %H:%M:%S')
      >>> d1
      datetime.datetime(2015, 7, 5, 23, 50)
      >>> aware = pytz.utc.localize(d1)
      >>> aware
      datetime.datetime(2015, 7, 5, 23, 50, tzinfo=<UTC>)
      >>>
      >>> df = pandas.DataFrame()
      >>> df['DateNaive'] = [d1]
      >>> df['DateAware'] = [aware]
      >>> df
                  DateNaive                 DateAware
      0 2015-07-05 23:50:00 2015-07-05 23:50:00+00:00
      >>>
      >>> table  = pa.Table.from_pandas(df)
      >>> table
      pyarrow.Table
      DateNaive: timestamp[ns]
      DateAware: timestamp[ns, tz=UTC]
      __index_level_0__: int64
      -- metadata --
      pandas: {"pandas_version": "0.20.3", "columns": [{"name": "DateNaive", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, {"name": "DateAware", "pandas_type": "datetimetz", "numpy_type": "datetime64[ns, UTC]", "metadata": {"timezone": "UTC"}}], "index_columns": ["__index_level_0__"]}
      >>>
      >>> pq.write_table(table, "E:\\pyarrowDates.parquet")
      >>>
      >>> pyarrowTable = pq.read_table("E:\\pyarrowDates.parquet")
      >>> pyarrowTable
      pyarrow.Table
      DateNaive: timestamp[us]
      DateAware: timestamp[us]
      __index_level_0__: int64
      -- metadata --
      pandas: {"pandas_version": "0.20.3", "columns": [{"name": "DateNaive", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, {"name": "DateAware", "pandas_type": "datetimetz", "numpy_type": "datetime64[ns, UTC]", "metadata": {"timezone": "UTC"}}], "index_columns": ["__index_level_0__"]}
      >>>
      >>> pyarrowDF = pyarrowTable.to_pandas()
      >>> pyarrowDF
                  DateNaive           DateAware
      0 2015-07-05 23:50:00 2015-07-05 23:50:00
      
      

      Attachments

        Activity

          People

            wesm Wes McKinney
            LucasPickup Lucas Pickup
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: