Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11077

[Rust] ParquetFileArrowReader panicks when trying to read nested list

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Invalid
    • None
    • 5.0.0
    • Rust
    • None

    Description

      I think this is documented in the code, but I can't be 100% sure.

      When trying to execute a DataFusion query over a Parquet file where one field is a struct with a nested list, the thread panicks due to unwrapping on an `Option::None` at this point ..] This `None` is returned by `visit_primitive`, but I can't quite make sense of why it returns a `None` rather than an error?

      I added a couple of dbg! calls to see what the item_type and list_type are:

      [/home/ben/repos/rust/arrow/rust/parquet/src/arrow/array_reader.rs:1339] &item_type = PrimitiveType {
          basic_info: BasicTypeInfo {
              name: "item",
              repetition: Some(
                  OPTIONAL,
              ),
              logical_type: UTF8,
              id: None,
          },
          physical_type: BYTE_ARRAY,
          type_length: -1,
          scale: -1,
          precision: -1,
      }
      [/home/ben/repos/rust/arrow/rust/parquet/src/arrow/array_reader.rs:1340] &list_type = GroupType {
          basic_info: BasicTypeInfo {
              name: "tags",
              repetition: Some(
                  OPTIONAL,
              ),
              logical_type: LIST,
              id: None,
          },
          fields: [
              GroupType {
                  basic_info: BasicTypeInfo {
                      name: "list",
                      repetition: Some(
                          REPEATED,
                      ),
                      logical_type: NONE,
                      id: None,
                  },
                  fields: [
                      PrimitiveType {
                          basic_info: BasicTypeInfo {
                              name: "item",
                              repetition: Some(
                                  OPTIONAL,
                              ),
                              logical_type: UTF8,
                              id: None,
                          },
                          physical_type: BYTE_ARRAY,
                          type_length: -1,
                          scale: -1,
                          precision: -1,
                      },
                  ],
              },
          ],
      }

      I guess we should at least use `.expect` here instead of `.unwrap` so it's more clear why this is happening!

      Attachments

        1. small-nested-lists.parquet
          4 kB
          Ben Sully

        Activity

          People

            nevi_me Neville Dipale
            sd2k Ben Sully
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: