Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-3703

[Python] DataFrame.to_parquet crashes if datetime column has time zones

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.11.1
    • 0.12.0
    • Python
    • pandas 0.23.4
      pyarrow 0.11.1
      Python 2.7, 3.5 - 3.7
      MacOS High Sierra (10.13.6)

    Description

      On CPython 2.7.15, 3.5.6, 3.6.6, and 3.7.0, creating a Pandas DataFrame with a datetime.datetime object serializes to Parquet just fine, but crashes with an AttributeError if you try to use the built-in timezone objects.

      To reproduce, on Python 3:

      import datetime as dt
      import pandas as pd
      
      df = pd.DataFrame({'foo': [dt.datetime(2018, 1, 1, 1, 23, 45, tzinfo=dt.timezone.utc)]})
      df.to_parquet('data.parq')
      

       

      On Python 2, create a subclass of datetime.tzinfo as shown here and try the same thing.

       

      The following exception results:

      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/core/frame.py", line 1945, in to_parquet
          compression=compression, **kwargs)
        File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/io/parquet.py", line 257, in to_parquet
          return impl.write(df, path, compression=compression, **kwargs)
        File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/io/parquet.py", line 118, in write
          table = self.api.Table.from_pandas(df)
        File "pyarrow/table.pxi", line 1217, in pyarrow.lib.Table.from_pandas
        File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 381, in dataframe_to_arrays
          convert_types)]
        File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 380, in <listcomp>
          for c, t in zip(columns_to_convert,
        File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 370, in convert_column
          return pa.array(col, type=ty, from_pandas=True, safe=safe)
        File "pyarrow/array.pxi", line 167, in pyarrow.lib.array
        File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 409, in get_datetimetz_type
          type_ = pa.timestamp(unit, tz)
        File "pyarrow/types.pxi", line 1038, in pyarrow.lib.timestamp
        File "pyarrow/types.pxi", line 955, in pyarrow.lib.tzinfo_to_string
      AttributeError: 'datetime.timezone' object has no attribute 'zone'
      
      'datetime.timezone' object has no attribute 'zone'
      

       
      This doesn't happen if you use pytz.UTC as the timezone object.

      Attachments

        Issue Links

          Activity

            People

              kszucs Krisztian Szucs
              yiannisliodakis Diego Argueta
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h