Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5125

[Python] Cannot roundtrip extreme dates through pyarrow

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.13.0
    • 0.15.0
    • Python
    • Windows 10, Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05)

    Description

      You can roundtrip many dates through a pyarrow array:

       

      >>> pa.array([datetime.date(1980, 1, 1)], type=pa.date32())[0]
      datetime.date(1980, 1, 1)

       

      But (on Windows at least), not extreme ones:

       

      >>> pa.array([datetime.date(1960, 1, 1)], type=pa.date32())[0]
      Traceback (most recent call last):
       File "<stdin>", line 1, in <module>
       File "pyarrow\scalar.pxi", line 74, in pyarrow.lib.ArrayValue.__repr__
       File "pyarrow\scalar.pxi", line 226, in pyarrow.lib.Date32Value.as_py
      OSError: [Errno 22] Invalid argument
      >>> pa.array([datetime.date(3200, 1, 1)], type=pa.date32())[0]
      Traceback (most recent call last):
       File "<stdin>", line 1, in <module>
       File "pyarrow\scalar.pxi", line 74, in pyarrow.lib.ArrayValue.__repr__
       File "pyarrow\scalar.pxi", line 226, in pyarrow.lib.Date32Value.as_py
      

      This is because datetime.utcfromtimestamp and datetime.timestamp fail on these dates, but it seems we should be able to totally avoid invoking this function when deserializing dates. Ideally we would be able to roundtrip these as datetimes too, of course, but it's less clear that this will be easy. For some context on this see https://bugs.python.org/issue29097.

      This may be related to ARROW-3176 and ARROW-4746

      Attachments

        Issue Links

          Activity

            People

              emkornfield@gmail.com Micah Kornfield
              batterseapower Max Bolingbroke
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m