[ARROW-10226] [Rust] [Parquet] Parquet reader reading wrong columns in some batches within a parquet file - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.0.0
Component/s: Rust, Rust - DataFusion
Labels:
None

External issue URL:
https://github.com/apache/arrow/issues/26226

Description

I re-installed my desktop a few days ago (now using Ubuntu 20.04 LTS) and when I try and run the TPC-H benchmark, it never completes and eventually uses up all 64 GB RAM.

I can run Spark against the data set and the query completes in 24 seconds, which IIRC is how long it took before.

It is possible that something is odd on my environment, but it is also possible/likely that this is a real bug.

I am investigating this and will update the Jira once I know more.

I also went back to old commits that were working for me before and they show the same issue so I don't think this is related to a recent code change.

Attachments

Activity

People

Assignee:: Andy Grove

Reporter:: Andy Grove

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 07/Oct/20 23:05

Updated:: 11/Jan/23 08:11

Resolved:: 08/Oct/20 18:11