Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5519

Sort fails to spill and results in an OOM

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.10.0
    • 1.12.0
    • None
    • None

    Description

      Setup :

      git.commit.id.abbrev=1e0a14c
      DRILL_MAX_DIRECT_MEMORY="32G"
      DRILL_MAX_HEAP="4G"
      No of nodes in the drill cluster : 1
      

      The below query fails with an OOM in the "in-memory sort" code, which means the logic which decides when to spill is flawed.

      0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET `exec.sort.disable_managed` = false;
      +-------+-------------------------------------+
      |  ok   |               summary               |
      +-------+-------------------------------------+
      | true  | exec.sort.disable_managed updated.  |
      +-------+-------------------------------------+
      1 row selected (1.022 seconds)
      0: jdbc:drill:zk=10.10.100.190:5181> alter session set `planner.memory.max_query_memory_per_node` = 334288000;
      +-------+----------------------------------------------------+
      |  ok   |                      summary                       |
      +-------+----------------------------------------------------+
      | true  | planner.memory.max_query_memory_per_node updated.  |
      +-------+----------------------------------------------------+
      1 row selected (0.369 seconds)
      0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select * from (select flatten(flatten(lst_lst)) num from dfs.`/drill/testdata/resource-manager/nested-large.json`) d order by d.num) d1 where d1.num < -1;
      Error: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
      
      Unable to allocate buffer of size 4194304 (rounded from 3200000) due to memory limit. Current allocation: 16015936
      Fragment 2:2
      
      [Error Id: 4d9cc59a-b5d1-4ca9-9b26-69d9438f0bee on qa-node190.qa.lab:31010] (state=,code=0)
      

      Below is the exception from the logs

      2017-05-16 13:46:33,233 [26e49afc-cf45-637b-acc1-a70fee7fe7e2:frag:2:2] INFO  o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred: One or more nodes ran out of memory while executing the query. (Unable to allocate buffer of size 4194304 (rounded from 3200000) due to memory limit. Current allocation: 16015936)
      org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
      
      Unable to allocate buffer of size 4194304 (rounded from 3200000) due to memory limit. Current allocation: 16015936
      
      [Error Id: 4d9cc59a-b5d1-4ca9-9b26-69d9438f0bee ]
              at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544) ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:244) [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_111]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_111]
              at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
      Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate buffer of size 4194304 (rounded from 3200000) due to memory limit. Current allocation: 16015936
              at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:220) ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:195) ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at org.apache.drill.exec.test.generated.MSorterGen44.setup(MSortTemplate.java:91) ~[na:na]
              at org.apache.drill.exec.physical.impl.xsort.managed.MergeSort.merge(MergeSort.java:110) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.sortInMemory(ExternalSortBatch.java:1159) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:687) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:559) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:92) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_111]
              at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_111]
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) ~[hadoop-common-2.7.0-mapr-1607.jar:na]
              at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227) [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
              ... 4 common frames omitted
      

      I attached the log and profile files. The data is too large to attach here

      Attachments

        1. drillbit.out
          0.2 kB
          Rahul Kumar Challapalli
        2. drillbit.log
          4.96 MB
          Rahul Kumar Challapalli
        3. drill-env.sh
          1 kB
          Rahul Kumar Challapalli
        4. 26e49afc-cf45-637b-acc1-a70fee7fe7e2.sys.drill
          35 kB
          Rahul Kumar Challapalli

        Issue Links

          Activity

            People

              paul-rogers Paul Rogers
              rkins Rahul Kumar Challapalli
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: