Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
I discovered this bug with this query
> SELECT tpep_pickup_datetime FROM taxi LIMIT 1; General("InvalidArgumentError(\"column types must match schema types, expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")")
The parquet reader detects this schema when reading from the file:
Schema { fields: [ Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, None), nullable: true, dict_id: 0, dict_is_ordered: false } ], metadata: {} }
The struct array read from the file contains:
[PrimitiveArray<UInt64> [ 1567318008000000, 1567319357000000, 1567320092000000, 1567321151000000,
When the Parquet arrow reader creates the record batch, the following validation logic fails:
for i in 0..columns.len() { if columns[i].len() != len { return Err(ArrowError::InvalidArgumentError( "all columns in a record batch must have the same length".to_string(), )); } if columns[i].data_type() != schema.field(i).data_type() { return Err(ArrowError::InvalidArgumentError(format!( "column types must match schema types, expected {:?} but found {:?} at column index {}", schema.field(i).data_type(), columns[i].data_type(), i))); } }
Attachments
Issue Links
- relates to
-
ARROW-8425 [Rust] [Parquet] Add support for writing temporal types
- Resolved