Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.8.0
-
None
Description
While testing implicit cast during join, I ran into an issue where if you run a query that throws an exception during execution, eventually, if you run enough of those, drill will run out of memory.
Here is a query example:
select count(*) from cast_tbl_1 a, cast_tbl_2 b where a.c_float = b.c_time
failed: RemoteRpcException: Failure while running fragment., Failure finding function that runtime code generation expected. Signature: compare_to_nulls_high( TIME:OPTIONAL, FLOAT4:OPTIONAL ) returns INT:REQUIRED [ 633c8ce3-1ed2-4a0a-8248-1e3d5b4f7c0a on atsqa4-133.qa.lab:31010 ]
[ 633c8ce3-1ed2-4a0a-8248-1e3d5b4f7c0a on atsqa4-133.qa.lab:31010 ]
Test_Failed: 2015/03/10 18:34:15.0015 - Failed to execute.
If you set planner.slice_target to 1, you hit out of memory after about ~40 or so of such failures on my cluster.
select count(*) from cast_tbl_1 a, cast_tbl_2 b where a.d38 = b.c_double
Query failed: OutOfMemoryException: You attempted to create a new child allocator with initial reservation 3000000 but only 916199 bytes of memory were available.
From the drillbit.log
2015-03-10 18:34:34,588 [2b00c6c5-5525-ae65-25f8-24ea2d88ba2f:foreman] INFO o.a.d.e.store.parquet.FooterGatherer - Fetch Parquet Footers: Executed 1 out of 1 using 1 threads. Time: 1ms total, 1.190007ms avg, 1ms max. 2015-03-10 18:34:34,591 [2b00c6c5-5525-ae65-25f8-24ea2d88ba2f:foreman] INFO o.a.d.e.store.parquet.FooterGatherer - Fetch Parquet Footers: Executed 1 out of 1 using 1 threads. Time: 0ms total, 0.953679ms avg, 0ms max. 2015-03-10 18:34:34,627 [2b00c6c5-5525-ae65-25f8-24ea2d88ba2f:foreman] INFO o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host atsqa4-136.qa.lab. Skipping affinity to that host. 2015-03-10 18:34:34,627 [2b00c6c5-5525-ae65-25f8-24ea2d88ba2f:foreman] INFO o.a.d.e.s.parquet.ParquetGroupScan - Load Parquet RowGroup block maps: Executed 1 out of 1 using 1 threads. Time: 1ms total, 1.609586ms avg, 1ms max. 2015-03-10 18:34:34,629 [2b00c6c5-5525-ae65-25f8-24ea2d88ba2f:foreman] INFO o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host atsqa4-136.qa.lab. Skipping affinity to that host. 2015-03-10 18:34:34,629 [2b00c6c5-5525-ae65-25f8-24ea2d88ba2f:foreman] INFO o.a.d.e.s.parquet.ParquetGroupScan - Load Parquet RowGroup block maps: Executed 1 out of 1 using 1 threads. Time: 1ms total, 1.270340ms avg, 1ms max. 2015-03-10 18:34:34,684 [2b00c6c5-5525-ae65-25f8-24ea2d88ba2f:foreman] INFO o.a.drill.exec.work.foreman.Foreman - State change requested. PENDING --> FAILED org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: Failure while getting memory allocator for fragment. at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:195) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] Caused by: org.apache.drill.common.exceptions.ExecutionSetupException: Failure while getting memory allocator for fragment. at org.apache.drill.exec.ops.FragmentContext.<init>(FragmentContext.java:119) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.setupRootFragment(Foreman.java:535) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:307) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:511) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:186) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] ... 4 common frames omitted Caused by: org.apache.drill.exec.memory.OutOfMemoryException: You attempted to create a new child allocator with initial reservation 3000000 but only 916199 bytes of memory were available. at org.apache.drill.exec.memory.TopLevelAllocator.getChildAllocator(TopLevelAllocator.java:121) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.ops.FragmentContext.<init>(FragmentContext.java:116) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] ... 8 common frames omitted 2015-03-10 18:34:34,700 [2b00c6c5-5525-ae65-25f8-24ea2d88ba2f:foreman] ERROR o.a.drill.exec.work.foreman.Foreman - Error 96a7baf4-f17a-454c-831b-f3dc77bd4381: OutOfMemoryException: You attempted to create a new child allocator with initial reservation 3000000 but only 916199 bytes of memory were available. org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: Failure while getting memory allocator for fragment. at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:195) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] Caused by: org.apache.drill.common.exceptions.ExecutionSetupException: Failure while getting memory allocator for fragment. at org.apache.drill.exec.ops.FragmentContext.<init>(FragmentContext.java:119) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.setupRootFragment(Foreman.java:535) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:307) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:511) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:186) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] ... 4 common frames omitted Caused by: org.apache.drill.exec.memory.OutOfMemoryException: You attempted to create a new child allocator with initial reservation 3000000 but only 916199 bytes of memory were available. at org.apache.drill.exec.memory.TopLevelAllocator.getChildAllocator(TopLevelAllocator.java:121) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.ops.FragmentContext.<init>(FragmentContext.java:116) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] ... 8 common frames omitted 2015-03-10 18:34:34,700 [2b00c6c5-5525-ae65-25f8-24ea2d88ba2f:foreman] INFO o.a.drill.exec.work.foreman.Foreman - foreman cleaning up - status: [0=>[0=>FragmentData [isLocal=true, status=profile {
I will attach reproduction and I have to add that I have no proof that error is actually causing memory leak (speculation on my part).