Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8258

[Rust] [Parquet] ArrowReader fails on some timestamp types

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0
    • Rust
    • None

    Description

      I discovered this bug with this query

      > SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
      General("InvalidArgumentError(\"column types must match schema types, expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") 

      The parquet reader detects this schema when reading from the file:

      Schema { 
        fields: [
          Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, None), nullable: true, dict_id: 0, dict_is_ordered: false }
        ], 
        metadata: {} 
      } 

      The struct array read from the file contains:

      [PrimitiveArray<UInt64>
      [
        1567318008000000,
        1567319357000000,
        1567320092000000,
        1567321151000000, 

       When the Parquet arrow reader creates the record batch, the following validation logic fails:

      for i in 0..columns.len() {
          if columns[i].len() != len {
              return Err(ArrowError::InvalidArgumentError(
                  "all columns in a record batch must have the same length".to_string(),
              ));
          }
          if columns[i].data_type() != schema.field(i).data_type() {
              return Err(ArrowError::InvalidArgumentError(format!(
                  "column types must match schema types, expected {:?} but found {:?} at column index {}",
                  schema.field(i).data_type(),
                  columns[i].data_type(),
                  i)));
          }
      }
       

      Attachments

        Issue Links

          Activity

            People

              liurenjie1024 Renjie Liu
              andygrove Andy Grove
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: