Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6901

[Rust][Parquet] SerializedFileWriter writes total_num_rows as zero

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.14.1, 0.15.0
    • Fix Version/s: 0.16.0
    • Component/s: Rust

      Description

      The SerializedFileWriter does not update total_num_rows at any point. This results in consistently writing zero as the number of rows in the file.

       

      This code will fail:

      let data = vec![vec![1, 2, 3, 4, 5]];
      let file: File = ...;
      
      let schema = Rc::new(
          types::Type::group_type_builder("schema")
              .with_fields(&mut vec![Rc::new(
                  types::Type::primitive_type_builder("col1", Type::INT32)
                      .with_repetition(Repetition::REQUIRED)
                      .build()
                      .unwrap(),
              )])
              .build()
              .unwrap(),
      );
      let props = Rc::new(WriterProperties::builder().build());
      let mut file_writer =
          SerializedFileWriter::new(file.try_clone().unwrap(), schema, props).unwrap();
      let mut rows: i64 = 0;
      
      for subset in &data {
          let mut row_group_writer = file_writer.next_row_group().unwrap();
          let col_writer = row_group_writer.next_column().unwrap();
          if let Some(mut writer) = col_writer {
              match writer {
                  ColumnWriter::Int32ColumnWriter(ref mut typed) => {
                      rows += typed.write_batch(&subset[..], None, None).unwrap() as i64;
                  }
                  _ => {
                      unimplemented!();
                  }
              }
              row_group_writer.close_column(writer).unwrap();
          }
          file_writer.close_row_group(row_group_writer).unwrap();
      }
      file_writer.close().unwrap();
      
      let reader = SerializedFileReader::new(file).unwrap();
      assert_eq!(reader.num_row_groups(), data.len());
      assert_eq!(reader.metadata().file_metadata().num_rows(), rows, "row count in metadata not equal to number of rows written");
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bw-matthew Matthew Franglen
                Reporter:
                bw-matthew Matthew Franglen
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m