Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7907

[Python] Conversion to pandas of empty table with timestamp type aborts

    XMLWordPrintableJSON

Details

    Description

      Creating an empty table:

      In [1]: table = pa.table({'a': pa.array([], type=pa.timestamp('us'))})                                                                                                                                             
      
      In [2]: table['a']                                                                                                                                                                                                 
      Out[2]: 
      <pyarrow.lib.ChunkedArray object at 0x7fbb783e8098>
      [
        []
      ]
      
      In [3]: table.to_pandas()                                                                                                                                                                                          
      Out[3]: 
      Empty DataFrame
      Columns: [a]
      Index: []
      

      the above works. But the ChunkedArray still has 1 empty chunk. When filtering data, you can actually get no chunks, and this fails:

      In [4]: table2 = table.slice(0, 0)                                                                                                                                                                                 
      
      In [5]: table2['a']                                                                                                                                                                                                
      Out[5]: 
      <pyarrow.lib.ChunkedArray object at 0x7fbb783aa4a8>
      [
      
      ]
      
      In [6]: table2.to_pandas()                                                                                                                                                                                         
      ../src/arrow/table.cc:48:  Check failed: (chunks.size()) > (0) cannot construct ChunkedArray from empty vector and omitted type
      ...
      Aborted (core dumped)
      

      and this seems to happen specifically for timestamp type, and specifically with non-ns unit (eg with us as above, which is the default in arrow).

      I noticed this when reading a parquet file of the taxi dataset, where the filter I used resulted in an empty batch.

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m