[ARROW-8782] [Rust] [DataFusion] Add benchmarks based on NYC Taxi data set - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.0.0
Component/s: Rust, Rust - DataFusion
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/24928

Description

I plan on adding a new benchmarks folder beneatch the datafusion crate, containing benchmarks based on the NYC Taxi data set. The benchmark will be a CLI and will support running a number of different queries against CSV and Parquet.

The README will contain instructions for downloading the data set.

The benchmark will produce CSV files containing results.

These benchmarks will allow us to manually verify performance before major releases and on an ongoing basis as we make changes to Arrow/Parquet/DataFusion.

I will be basing this on existing benchmarks I recently built in Ballista [1] (I am the only contributor to these benchmarks so far).

A dockerfile will be provided, making it easy to restrict CPU and RAM when running these benchmarks.

[1] https://github.com/ballista-compute/ballista/tree/master/rust/benchmarks

Attachments

Issue Links

links to

GitHub Pull Request #7205

Activity

People

Assignee:: Andy Grove

Reporter:: Andy Grove

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 13/May/20 13:00

Updated:: 11/Jan/23 08:02

Resolved:: 26/May/20 15:34

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1.5h