Details
-
Sub-task
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
Description
I have some data which is partitioned by year/month/date. It would be useful if the date could be automatically parsed:
In [17]: schema = pa.schema([("year", pa.int16()), ("month", pa.int8()), ("day", pa.date32())]) In [18]: partition = DirectoryPartitioning(schema) In [19]: partition.parse("/2020/06/2020-06-08") --------------------------------------------------------------------------- ArrowNotImplementedError Traceback (most recent call last) <ipython-input-19-c227c808b401> in <module> ----> 1 partition.parse("/2020/06/2020-06-08") ~\envs\dev\lib\site-packages\pyarrow\_dataset.pyx in pyarrow._dataset.Partitioning.parse() ~\envs\dev\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.pyarrow_internal_check_status() ~\envs\dev\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.check_status() ArrowNotImplementedError: parsing scalars of type date32[day]
Not a big issue since you can just use string and convert, but nevertheless it would be nice if it Just Worked
In [22]: schema = pa.schema([("year", pa.int16()), ("month", pa.int8()), ("day", pa.string())]) In [23]: partition = DirectoryPartitioning(schema) In [24]: partition.parse("/2020/06/2020-06-08") Out[24]: <pyarrow.dataset.AndExpression (((year == 2020:int16) and (month == 6:int8)) and (day == 2020-06-08:string))>