Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10213

[C++] Temporal cast from timestamp to date rounds instead of extracting date component

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.0.1
    • Fix Version/s: None
    • Component/s: C++
    • Labels:
      None

      Description

      I'd expect this code to give 1950-01-01 twice (i.e. a timestamp -> date cast extracts the date component, ignoring the time component):

      import datetime
      import pyarrow as pa
      arr = pa.array([
          datetime.datetime(1950, 1, 1, 0, 0, 0),
          datetime.datetime(1950, 1, 1, 12, 0, 0),
      ], type=pa.timestamp("ns"))
      print(arr)
      print(arr.cast(pa.date32(), safe=False)) 

      However it gives 1950-01-02 in the second case:

      [
        1950-01-01 00:00:00.000000000,
        1950-01-01 12:00:00.000000000
      ]
      [
        1950-01-01,
        1950-01-02
      ]
      

      The reason is that the temporal cast simply divides, and C truncates towards 0 (note: Python truncates towards -Infinity, so it would give the right answer in this case!), resulting in -7304 days instead of -7305.

      Depending on the intended semantics of a temporal cast, either it should be fixed to extract the date component, or the rounding behavior should be noted and a separate kernel should be implemented for extracting the date component.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              lidavidm David Li
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: