Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2711

[Python/C++] Pandas-Arrow doesn't roundtrip when column of lists has empty first element

    XMLWordPrintableJSON

    Details

      Description

      Hi, I thought this had been fixed in the past, but this simple use case still breaks:

       

      df = pd.DataFrame(dict(x=[[], ["a"]]))
      tbl = pyarrow.Table.from_pandas(df)
      print(tbl.schema)
      

      results in a wrong inferred type of "list<item: null>":

       

      x: list<item: null>
        child 0, item: null
      __index_level_0__: int64
      metadata
      --------
      {b'pandas': b'{"index_columns": ["__index_level_0__"], "column_indexes": [{"na'
                  b'me": null, "field_name": null, "pandas_type": "unicode", "numpy_'
                  b'type": "object", "metadata": {"encoding": "UTF-8"}}], "columns":'
                  b' [{"name": "x", "field_name": "x", "pandas_type": "list[empty]",'
                  b' "numpy_type": "object", "metadata": null}, {"name": null, "fiel'
                  b'd_name": "__index_level_0__", "pandas_type": "int64", "numpy_typ'
                  b'e": "int64", "metadata": null}], "pandas_version": "0.22.0"}'}

      When converting the Table back to pandas all elements are now None too:

       

      df2 = tbl.to_pandas()
      print(df2)
      
             x
      
      0     [] 
      1 [None]
      

       

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                apitrou Antoine Pitrou
                Reporter:
                buhrmann Thomas Buhrmann
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m