Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5317

[Rust] [Parquet] impl IntoIterator for SerializedFileReader

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 0.14.0
    • Rust

    Description

      This is a follow up to https://github.com/apache/arrow/issues/4301.

      The current implementation of a row iterator RowIter borrows the FileReader
      which the user has to keep the file reader alive for as long as the iterator is alive..

      And make is hard to iterate over multiple FileReader / RowIter..

      fn main() {
          let path1 = Path::new("path-to/1.snappy.parquet");
          let path2 = Path::new("path-to/2.snappy.parquet");
          let vec = vec![path1, path2];
          let it = vec.iter()
              .map(|p| {
                  File::open(p).unwrap()
              })
              .map(|f| {
                  SerializedFileReader::new(f).unwrap()
              })
              .flat_map(|reader| -> RowIter {
                  RowIter::from_file(None, &reader).unwrap()
      //|             |                        |
      //|             |                        `reader` is borrowed here
      //|             returns a value referencing data owned by the current function
              })
          ;
      
          for r in it {
              println!("{}", r);
          }
      }
      

      One solution could be to implement a row iterator that takes owners of the reader.

      Perhaps implementing std::iter::IntoIterator for the SerializedFileReader

      ....
      .map(|p| {
          File::open(p).unwrap()
      })
      .map(|f| {
          SerializedFileReader::new(f).unwrap()
      })
      .flat_map(|r| -> r.into_iter())
      ....
      

       

      Happy to put a PR out with this..
      Please let me know if this makes sense and you guys already have some way of doing this..

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              FabioBatSilva Fabio Silva
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h