Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17540

[Python] Can not refer to field in a list of structs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 9.0.0
    • 11.0.0
    • Python
    • None

    Description

      When the dataset has nested sturcts, "list<struct>",  we can not use `pyarrow.field(..)` to get the reference of the sub-field of the struct.

       

      For example

       

      import pyarrow as pa
      import pyarrow.dataset as ds
      import pandas as pd
      
      schema = pa.schema(
          [
              pa.field(
                  "objects",
                  pa.list_(
                      pa.struct(
                          [
                              pa.field("name", pa.utf8()),
                              pa.field("attr1", pa.float32()),
                              pa.field("attr2", pa.int32()),
                          ]
                      )
                  ),
              )
          ]
      )
      
      table = pa.Table.from_pandas(
          pd.DataFrame([{"objects": [{"name": "a", "attr1": 5.0, "attr2": 20}]}])
      )
      print(table)
      
      dataset = ds.dataset(table)
      print(dataset)
      dataset.scanner(columns=["objects.attr2"]).to_table()
      

      which throws exception:

      Traceback (most recent call last):
        File "foo.py", line 31, in <module>
          dataset.scanner(columns=["objects.attr2"]).to_table()
        File "pyarrow/_dataset.pyx", line 298, in pyarrow._dataset.Dataset.scanner
        File "pyarrow/_dataset.pyx", line 2356, in pyarrow._dataset.Scanner.from_dataset
        File "pyarrow/_dataset.pyx", line 2202, in pyarrow._dataset._populate_builder
        File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
      pyarrow.lib.ArrowInvalid: No match for FieldRef.Name(objects.attr2) in objects: list<item: struct<attr1: double, attr2: int64, name: string>>
      __fragment_index: int32
      __batch_index: int32
      __last_in_fragment: bool
      __filename: string
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              eddyxu Lei (Eddy) Xu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: