Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7675

Very slow performance and Memory exhaustion while querying on very small dataset of parquet files

    XMLWordPrintableJSON

Details

    Description

      Per our discussion in Slack/Dev-list Here are all details and sample data-set to recreate problematic query behavior:

      • We are using Drill 1.18.0-SNAPSHOT built on March 6
      • We are joining on two small Parquet datasets residing on S3 using the following query:
      SELECT 
       CASE
       WHEN tbl1.`timestamp` IS NULL THEN tbl2.`timestamp`
       ELSE tbl1.`timestamp`
       END AS ts, *
       FROM `s3-store.state.`/164` AS tbl1
       FULL OUTER JOIN `s3-store.result`.`/164` AS tbl2
       ON tbl1.`timestamp`*10 = tbl2.`timestamp`
       ORDER BY ts ASC
       LIMIT 500 OFFSET 0 ROWS
      
      • We are running drill in a single node setup on a 16 core, 64GB ram machine. Drill heap size is set to 16GB, while max direct memory is set to 32GB.
      • As the dataset consist of really small files, Drill has been tweaked to parallelize on small item count by tweaking the following variables:
      planner.slice_target = 25
      planner.width.max_per_node = 16 (to match the core count)
      • Without the above parallelization, query speeds on parquet files are super slow (tens of seconds)
      • While queries do work, we are seeing non-proportional direct memory/heap utilization. (up 20GB of direct memory used, a min of 12GB heap required)
      • We're still encountering the occasional OOM of memory error (we're also seeing heap exhaustion, but I guess that's another indication to same problem. Reducing the node parallelization width to say, 8, reduces memory contention, though it still reaches 8 gb of direct memory 
      User Error Occurred: One or more nodes ran out of memory while executing the query. (null)
       org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.null[Error Id: 67b61fc9-320f-47a1-8718-813843a10ecc ]
       at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657)
       at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:338)
       at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
       Caused by: org.apache.drill.exec.exception.OutOfMemoryException: null
       at org.apache.drill.exec.vector.complex.AbstractContainerVector.allocateNew(AbstractContainerVector.java:59)
       at org.apache.drill.exec.test.generated.PartitionerGen5$OutgoingRecordBatch.allocateOutgoingRecordBatch(PartitionerTemplate.java:380)
       at org.apache.drill.exec.test.generated.PartitionerGen5$OutgoingRecordBatch.initializeBatch(PartitionerTemplate.java:400)
       at org.apache.drill.exec.test.generated.PartitionerGen5.setup(PartitionerTemplate.java:126)
       at org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.createClassInstances(PartitionSenderRootExec.java:263)
       at org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.createPartitioner(PartitionSenderRootExec.java:218)
       at org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:188)
       at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:93)
       at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:323)
       at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:310)
       at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:422)
       at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
       at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:310)
       ... 4 common frames omitted

      I've attached a (real!) sample data-set to match the query above. That same dataset recreates the aforementioned memory behavior

      Help, please.

      Idan

       

      Attachments

        1. sample-dataset.zip
          1.19 MB
          Idan Sheinberg

        Issue Links

          Activity

            People

              Paul.Rogers Paul Rogers
              sheinbergon Idan Sheinberg
              Arina Ielchiieva Arina Ielchiieva
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: