Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Duplicate
-
1.0.1
-
None
Description
I'd expect this code to give 1950-01-01 twice (i.e. a timestamp -> date cast extracts the date component, ignoring the time component):
import datetime import pyarrow as pa arr = pa.array([ datetime.datetime(1950, 1, 1, 0, 0, 0), datetime.datetime(1950, 1, 1, 12, 0, 0), ], type=pa.timestamp("ns")) print(arr) print(arr.cast(pa.date32(), safe=False))
However it gives 1950-01-02 in the second case:
[ 1950-01-01 00:00:00.000000000, 1950-01-01 12:00:00.000000000 ] [ 1950-01-01, 1950-01-02 ]
The reason is that the temporal cast simply divides, and C truncates towards 0 (note: Python truncates towards -Infinity, so it would give the right answer in this case!), resulting in -7304 days instead of -7305.
Depending on the intended semantics of a temporal cast, either it should be fixed to extract the date component, or the rounding behavior should be noted and a separate kernel should be implemented for extracting the date component.
Attachments
Issue Links
- duplicates
-
ARROW-13549 [C++] Implement timestamp to date/time cast that extracts value
- Resolved