Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-16540

Support storing different timezone in an array

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Format, Python
    • None

    Description

      As a user, I wish I could use pyarrow to store a column of datetimes with different timezones. In certain datasets, it is ideal to a column with mixed timezones (ex - taxi pickups). Even if the data is limited to a single location (let's say a business in NYC for example) over the time span of a single year... then your timezones will be EDT/EST with offsets of -4:00 and -5:00.

       

      Currently, it is not possible to keep a column with different timezones.

       

      import pytz
      import pyarrow as pa
      import pytz
      from datetime import datetime
      arr = pa.array(
          [datetime(year=2010, month=1, day=1, hour=9, minute=0, second=0, tzinfo=pytz.timezone('US/Eastern')), 
           datetime(year=2010, month=1, day=1, hour=6, minute=0, second=0, tzinfo=pytz.timezone('US/Pacific'))
          ]
      )
      # value at index 0, 9AM ET
      # value at index 1, 6AM PT is 9AM ET
      assert arr[0].as_py().hour == 9 # fail
      assert arr[1].as_py().hour == 9 # fail
      assert arr[0].as_py().hour == 6 # fail
      assert arr[1].as_py().hour == 6 # fail

       

      > Both datetime values are actually at the same time (although different timezones)

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            gsheni Gaurav Sheni
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: