Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-15142

Cannot mix struct and non-struct, non-null values error when saving nested types with PyArrow

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 6.0.1
    • None
    • Python
    • None

    Description

      When trying to save a Pandas dataframe with a nested type (list within list, list within dict) using pyarrow engine, the following error is encountered

      ArrowInvalid: ('cannot mix list and non-list, non-null values', 'Conversion failed for column A with type object')

       

      Repro:

      import pandas as pd
      x = pd.DataFrame({"A": [[24, 27, [1, 1]]]})
      x.to_parquet('/tmp/a.pqt', engine="pyarrow"

      Doing a bit of googling, it appears that this is a known Arrow shortcoming. However, this is a commonly encountered datastructure, and 'fastparquet' handles this seamlessly. Is there a proposed timeline/plan for fixing this?

      Attachments

        Activity

          People

            Unassigned Unassigned
            KartCpp Karthik
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: