Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
Description
This is a follow up to https://github.com/apache/arrow/issues/4301.
The current implementation of a row iterator RowIter borrows the FileReader
which the user has to keep the file reader alive for as long as the iterator is alive..
And make is hard to iterate over multiple FileReader / RowIter..
fn main() { let path1 = Path::new("path-to/1.snappy.parquet"); let path2 = Path::new("path-to/2.snappy.parquet"); let vec = vec![path1, path2]; let it = vec.iter() .map(|p| { File::open(p).unwrap() }) .map(|f| { SerializedFileReader::new(f).unwrap() }) .flat_map(|reader| -> RowIter { RowIter::from_file(None, &reader).unwrap() //| | | //| | `reader` is borrowed here //| returns a value referencing data owned by the current function }) ; for r in it { println!("{}", r); } }
One solution could be to implement a row iterator that takes owners of the reader.
Perhaps implementing std::iter::IntoIterator for the SerializedFileReader
....
.map(|p| {
File::open(p).unwrap()
})
.map(|f| {
SerializedFileReader::new(f).unwrap()
})
.flat_map(|r| -> r.into_iter())
....
Happy to put a PR out with this..
Please let me know if this makes sense and you guys already have some way of doing this..
Attachments
Issue Links
- links to