[ARROW-10999] [Rust] TPC-H parquet files cannot be read by Apache Spark - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.0
Component/s: Rust
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/26919

Description

The TPC-H parquet files generated by the benchmark crate cannot be read by Apache Spark because they use unsigned ints, which cannot be read in Spark (I am guessing because Java only has signed ints).

I would like to use the same data sets for benchmarking DataFusion, Apache Spark, and other tools.

Attachments

Issue Links

links to

GitHub Pull Request #8980

Activity

People

Assignee:: Andy Grove

Reporter:: Andy Grove

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 21/Dec/20 16:05

Updated:: 11/Jan/23 08:16

Resolved:: 22/Dec/20 05:31

Time Tracking

Estimated:

Not Specified

Remaining:

Logged: