[DRILL-7675] Very slow performance and Memory exhaustion while querying on very small dataset of parquet files - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.18.0
Fix Version/s: 1.18.0
Component/s: Query Planning & Optimization, Storage - Parquet
Labels:
- ready-to-commit
Environment:

Hide

sample-dataset.zip

Show
sample-dataset.zip

Description

Per our discussion in Slack/Dev-list Here are all details and sample data-set to recreate problematic query behavior:

We are using Drill 1.18.0-SNAPSHOT built on March 6
We are joining on two small Parquet datasets residing on S3 using the following query:

SELECT 
 CASE
 WHEN tbl1.`timestamp` IS NULL THEN tbl2.`timestamp`
 ELSE tbl1.`timestamp`
 END AS ts, *
 FROM `s3-store.state.`/164` AS tbl1
 FULL OUTER JOIN `s3-store.result`.`/164` AS tbl2
 ON tbl1.`timestamp`*10 = tbl2.`timestamp`
 ORDER BY ts ASC
 LIMIT 500 OFFSET 0 ROWS

We are running drill in a single node setup on a 16 core, 64GB ram machine. Drill heap size is set to 16GB, while max direct memory is set to 32GB.
As the dataset consist of really small files, Drill has been tweaked to parallelize on small item count by tweaking the following variables:

planner.slice_target = 25
planner.width.max_per_node = 16 (to match the core count)

Without the above parallelization, query speeds on parquet files are super slow (tens of seconds)
While queries do work, we are seeing non-proportional direct memory/heap utilization. (up 20GB of direct memory used, a min of 12GB heap required)
We're still encountering the occasional OOM of memory error (we're also seeing heap exhaustion, but I guess that's another indication to same problem. Reducing the node parallelization width to say, 8, reduces memory contention, though it still reaches 8 gb of direct memory

User Error Occurred: One or more nodes ran out of memory while executing the query. (null)
 org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.null[Error Id: 67b61fc9-320f-47a1-8718-813843a10ecc ]
 at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657)
 at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:338)
 at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 Caused by: org.apache.drill.exec.exception.OutOfMemoryException: null
 at org.apache.drill.exec.vector.complex.AbstractContainerVector.allocateNew(AbstractContainerVector.java:59)
 at org.apache.drill.exec.test.generated.PartitionerGen5$OutgoingRecordBatch.allocateOutgoingRecordBatch(PartitionerTemplate.java:380)
 at org.apache.drill.exec.test.generated.PartitionerGen5$OutgoingRecordBatch.initializeBatch(PartitionerTemplate.java:400)
 at org.apache.drill.exec.test.generated.PartitionerGen5.setup(PartitionerTemplate.java:126)
 at org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.createClassInstances(PartitionSenderRootExec.java:263)
 at org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.createPartitioner(PartitionSenderRootExec.java:218)
 at org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:188)
 at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:93)
 at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:323)
 at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:310)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
 at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:310)
 ... 4 common frames omitted

I've attached a (real!) sample data-set to match the query above. That same dataset recreates the aforementioned memory behavior

Help, please.

Idan

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

sample-dataset.zip
29/Mar/20 00:49
1.19 MB
Idan Sheinberg

Issue Links

is related to

DRILL-7686 Excessive memory use in partition sender

Open

DRILL-7687 Inaccurate memory estimates in hash join

Open

links to

GitHub Pull Request #2047

Activity

People

Assignee:: Paul Rogers

Reporter:: Idan Sheinberg

Reviewer:: Arina Ielchiieva

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 29/Mar/20 00:50

Updated:: 11/Apr/20 15:38

Resolved:: 11/Apr/20 15:38