[ARROW-8258] [Rust] [Parquet] ArrowReader fails on some timestamp types - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.0
Component/s: Rust
Labels:
None

External issue URL:
https://github.com/apache/arrow/issues/24454

Description

I discovered this bug with this query

> SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
General("InvalidArgumentError(\"column types must match schema types, expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")")

The parquet reader detects this schema when reading from the file:

Schema { 
  fields: [
    Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, None), nullable: true, dict_id: 0, dict_is_ordered: false }
  ], 
  metadata: {} 
}

The struct array read from the file contains:

[PrimitiveArray<UInt64>
[
  1567318008000000,
  1567319357000000,
  1567320092000000,
  1567321151000000,

When the Parquet arrow reader creates the record batch, the following validation logic fails:

for i in 0..columns.len() {
    if columns[i].len() != len {
        return Err(ArrowError::InvalidArgumentError(
            "all columns in a record batch must have the same length".to_string(),
        ));
    }
    if columns[i].data_type() != schema.field(i).data_type() {
        return Err(ArrowError::InvalidArgumentError(format!(
            "column types must match schema types, expected {:?} but found {:?} at column index {}",
            schema.field(i).data_type(),
            columns[i].data_type(),
            i)));
    }
}

Attachments

Issue Links

relates to

ARROW-8425 [Rust] [Parquet] Add support for writing temporal types

Resolved

Activity

People

Assignee:: Renjie Liu

Reporter:: Andy Grove

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 29/Mar/20 17:13

Updated:: 11/Jan/23 07:59

Resolved:: 15/Dec/20 12:54