Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7675

Very slow performance and Memory exhaustion while querying on very small dataset of parquet files

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Per our discussion in Slack/Dev-list Here are all details and sample data-set to recreate problematic query behavior:

      • We are using Drill 1.18.0-SNAPSHOT built on March 6
      • We are joining on two small Parquet datasets residing on S3 using the following query:
      SELECT 
       CASE
       WHEN tbl1.`timestamp` IS NULL THEN tbl2.`timestamp`
       ELSE tbl1.`timestamp`
       END AS ts, *
       FROM `s3-store.state.`/164` AS tbl1
       FULL OUTER JOIN `s3-store.result`.`/164` AS tbl2
       ON tbl1.`timestamp`*10 = tbl2.`timestamp`
       ORDER BY ts ASC
       LIMIT 500 OFFSET 0 ROWS
      
      • We are running drill in a single node setup on a 16 core, 64GB ram machine. Drill heap size is set to 16GB, while max direct memory is set to 32GB.
      • As the dataset consist of really small files, Drill has been tweaked to parallelize on small item count by tweaking the following variables:
      planner.slice_target = 25
      planner.width.max_per_node = 16 (to match the core count)
      • Without the above parallelization, query speeds on parquet files are super slow (tens of seconds)
      • While queries do work, we are seeing non-proportional direct memory/heap utilization. (up 20GB of direct memory used, a min of 12GB heap required)
      • We're still encountering the occasional OOM of memory error (we're also seeing heap exhaustion, but I guess that's another indication to same problem. Reducing the node parallelization width to say, 8, reduces memory contention, though it still reaches 8 gb of direct memory 
      User Error Occurred: One or more nodes ran out of memory while executing the query. (null)
       org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.null[Error Id: 67b61fc9-320f-47a1-8718-813843a10ecc ]
       at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657)
       at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:338)
       at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
       Caused by: org.apache.drill.exec.exception.OutOfMemoryException: null
       at org.apache.drill.exec.vector.complex.AbstractContainerVector.allocateNew(AbstractContainerVector.java:59)
       at org.apache.drill.exec.test.generated.PartitionerGen5$OutgoingRecordBatch.allocateOutgoingRecordBatch(PartitionerTemplate.java:380)
       at org.apache.drill.exec.test.generated.PartitionerGen5$OutgoingRecordBatch.initializeBatch(PartitionerTemplate.java:400)
       at org.apache.drill.exec.test.generated.PartitionerGen5.setup(PartitionerTemplate.java:126)
       at org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.createClassInstances(PartitionSenderRootExec.java:263)
       at org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.createPartitioner(PartitionSenderRootExec.java:218)
       at org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:188)
       at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:93)
       at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:323)
       at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:310)
       at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:422)
       at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
       at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:310)
       ... 4 common frames omitted

      I've attached a (real!) sample data-set to match the query above. That same dataset recreates the aforementioned memory behavior

      Help, please.

      Idan

       

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Paul.Rogers Paul Rogers
            sheinbergon Idan Sheinberg
            Arina Ielchiieva Arina Ielchiieva
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment