Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10213

[C++] Temporal cast from timestamp to date rounds instead of extracting date component

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Duplicate
    • 1.0.1
    • 6.0.0
    • C++
    • None

    Description

      I'd expect this code to give 1950-01-01 twice (i.e. a timestamp -> date cast extracts the date component, ignoring the time component):

      import datetime
      import pyarrow as pa
      arr = pa.array([
          datetime.datetime(1950, 1, 1, 0, 0, 0),
          datetime.datetime(1950, 1, 1, 12, 0, 0),
      ], type=pa.timestamp("ns"))
      print(arr)
      print(arr.cast(pa.date32(), safe=False)) 

      However it gives 1950-01-02 in the second case:

      [
        1950-01-01 00:00:00.000000000,
        1950-01-01 12:00:00.000000000
      ]
      [
        1950-01-01,
        1950-01-02
      ]
      

      The reason is that the temporal cast simply divides, and C truncates towards 0 (note: Python truncates towards -Infinity, so it would give the right answer in this case!), resulting in -7304 days instead of -7305.

      Depending on the intended semantics of a temporal cast, either it should be fixed to extract the date component, or the rounding behavior should be noted and a separate kernel should be implemented for extracting the date component.

      Attachments

        Issue Links

          Activity

            People

              lidavidm David Li
              lidavidm David Li
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: