Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13151

[Python] Unable to read single child field of struct column from Parquet

    XMLWordPrintableJSON

Details

    Description

      Given the following table

      data = {"root": [[{"addr": {"this": 3, "that": 3}}]]}
      table = pa.Table.from_pydict(data)
      

      reading the nested column leads to an pyarrow.lib.ArrowInvalid error:

      pq.write_table(table, "/tmp/table.parquet")
      file = pq.ParquetFile("/tmp/table.parquet")
      array = file.read(["root.list.item.addr.that"])
      

      Traceback:

      Traceback (most recent call last):
        File "....", line 21, in <module>
          array = file.read(["root.list.item.addr.that"])
        File "/home/angus/.mambaforge/envs/awkward/lib/python3.9/site-packages/pyarrow/parquet.py", line 383, in read
          return self.reader.read_all(column_indices=column_indices,
        File "pyarrow/_parquet.pyx", line 1097, in pyarrow._parquet.ParquetReader.read_all
        File "pyarrow/error.pxi", line 97, in pyarrow.lib.check_status
      pyarrow.lib.ArrowInvalid: List child array invalid: Invalid: Struct child array #0 does not match type field: struct<that: int64> vs struct<that: int64, this: int64>
      

      It's possible that I don't quite understand this properly - am I doing something wrong?

      Attachments

        Issue Links

          Activity

            People

              emkornfield Micah Kornfield
              agoose77 Angus Hollands
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5h
                  5h