Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13120

[Rust][Parquet] Cannot read multiple batches from parquet with string list column

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Later
    • None
    • None
    • Rust
    • None

    Description

      This issue only occurs when the batch size < the number of rows in the table. The attached parquet `test.parquet` has 31430 rows and a single column containing string lists. This issue does not appear to occur for parquets with integer list columns.

        

      #[test]
       fn failing_test() {
       let parquet_file_reader = get_test_reader("test.parquet");
       let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader);
       let mut record_batches = Vec::new();
       let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
       for batch in record_batch_reader {
         record_batches.push(batch);
       }
      }
      

       

      ---- arrow::arrow_reader::tests::failing_test stdout ----
      thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected infallable creation of GenericListArray from ArrayDataRef failed: InvalidArgumentError("offsets do not start at zero")', arrow/src/array/array_list.rs:195:45
      note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
      

       

       

      Attachments

        1. test.parquet
          46 kB
          Morgan Cassels

        Activity

          People

            Unassigned Unassigned
            m_cassels Morgan Cassels
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: