Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0
Description
While looking for a way to make loading array data from parquet files faster, I stumbled on an edge case where string and binary arrays are created with an incorrect length from an iterator with no upper bound.
Here is a simple example:
```
// iterator that doesn't declare (upper) size bound
let string_iter = (0..).scan(0usize, |pos, i| {
if *pos < 10 {
*pos += 1;
Some(Some(format!("value {}", i)))
}
else
})
// limited using take()
.take(100);
let (lower_size_bound, upper_size_bound) = string_iter.size_hint();
assert_eq!(lower_size_bound, 0);
// the upper bound, defined by take above, is 100
assert_eq!(upper_size_bound, Some(100));
let string_array: StringArray = string_iter.collect();
// but the actual number of items in the array is 10
assert_eq!(string_array.len(), 10);
```
Fortunately this is easy to fix by using the length of the child offset array and I will be creating a PR for this shortly.
Attachments
Issue Links
- links to