Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6901

[Rust][Parquet] SerializedFileWriter writes total_num_rows as zero

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.14.1, 0.15.0
    • 0.16.0
    • Rust

    Description

      The SerializedFileWriter does not update total_num_rows at any point. This results in consistently writing zero as the number of rows in the file.

       

      This code will fail:

      let data = vec![vec![1, 2, 3, 4, 5]];
      let file: File = ...;
      
      let schema = Rc::new(
          types::Type::group_type_builder("schema")
              .with_fields(&mut vec![Rc::new(
                  types::Type::primitive_type_builder("col1", Type::INT32)
                      .with_repetition(Repetition::REQUIRED)
                      .build()
                      .unwrap(),
              )])
              .build()
              .unwrap(),
      );
      let props = Rc::new(WriterProperties::builder().build());
      let mut file_writer =
          SerializedFileWriter::new(file.try_clone().unwrap(), schema, props).unwrap();
      let mut rows: i64 = 0;
      
      for subset in &data {
          let mut row_group_writer = file_writer.next_row_group().unwrap();
          let col_writer = row_group_writer.next_column().unwrap();
          if let Some(mut writer) = col_writer {
              match writer {
                  ColumnWriter::Int32ColumnWriter(ref mut typed) => {
                      rows += typed.write_batch(&subset[..], None, None).unwrap() as i64;
                  }
                  _ => {
                      unimplemented!();
                  }
              }
              row_group_writer.close_column(writer).unwrap();
          }
          file_writer.close_row_group(row_group_writer).unwrap();
      }
      file_writer.close().unwrap();
      
      let reader = SerializedFileReader::new(file).unwrap();
      assert_eq!(reader.num_row_groups(), data.len());
      assert_eq!(reader.metadata().file_metadata().num_rows(), rows, "row count in metadata not equal to number of rows written");
      

      Attachments

        Issue Links

          Activity

            People

              bw-matthew Matthew Franglen
              bw-matthew Matthew Franglen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m