[SPARK-25438] Fix FilterPushdownBenchmark to use the same memory assumption - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4.0
Fix Version/s: 2.4.0
Component/s: SQL, Tests
Labels:
None

Description

This issue aims to fix three things in `FilterPushdownBenchmark`.

1. Use the same memory assumption.
The following configurations are used in ORC and Parquet.

Memory buffer for writing

parquet.block.size (default: 128MB)
orc.stripe.size (default: 64MB)

Compression chunk size

parquet.page.size (default: 1MB)
orc.compress.size (default: 256KB)

~~SPARK-24692~~ used 1MB, the default value of `parquet.page.size`, for `parquet.block.size` and `orc.stripe.size`. But, it missed to match `orc.compress.size`. So, the current benchmark shows the result from ORC with 256KB memory for compression and Parquet with 1MB. To compare correctly, we need to be consistent.

2. Dictionary encoding should not be enforced for all cases.
~~SPARK-24206~~ enforced dictionary encoding for all test cases. This issue recovers the ORC behavior in general and enforces dictionary encoding only for `prepareStringDictTable`.

3. Generate test result on AWS r3.xlarge.
We do not
~~SPARK-24206~~ generates the result on AWS in order to reproduce and compare easily. This issue also aims to update the result on the same machine again in the same reason. Specifically, AWS r3.xlarge with Instance Store is used.

Attachments

Issue Links

blocks

SPARK-20901 Feature parity for ORC with Parquet

Open

relates to

SPARK-24692 Improvement FilterPushdownBenchmark

Resolved

links to

Activity

People

Assignee:: Dongjoon Hyun

Reporter:: Dongjoon Hyun

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 15/Sep/18 09:07

Updated:: 16/Sep/18 01:03

Resolved:: 16/Sep/18 00:49