Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6780

[C++][Parquet] Support DurationType in writing/reading parquet

    XMLWordPrintableJSON

Details

    Description

      Currently this is not supported:

      In [37]: table = pa.table({'a': pa.array([1, 2], pa.duration('s'))}) 
      
      In [39]: table
      Out[39]: 
      pyarrow.Table
      a: duration[s]
      
      In [41]: pq.write_table(table, 'test_duration.parquet')
      ...
      ArrowNotImplementedError: Unhandled type for Arrow to Parquet schema conversion: duration[s]
      

      There is no direct mapping to Parquet logical types. There is an INTERVAL type, but this more matches Arrow's ( YEAR_MONTH or DAY_TIME) interval type.

      But, those duration values could be stored as just integers, and based on the serialized arrow schema, it could be restored when reading back in.

      Attachments

        Issue Links

          Activity

            People

              jorisvandenbossche Joris Van den Bossche
              jorisvandenbossche Joris Van den Bossche
              Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 40m
                  1h 40m