Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.21.1
-
None
-
None
Description
Drill fails to execute a query with the following exception:
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: IllegalStateException: Cleanup before finished. 0 out of 1 streams have finished Fragment: 1:0 Please, refer to logs for more information. [Error Id: 270da8f4-0bb6-4985-bf4f-34853138881c on compute7.vmcluster.com:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657) at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:395) at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:245) at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:362) at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.lang.IllegalStateException: Cleanup before finished. 0 out of 1 streams have finished at org.apache.drill.exec.work.batch.BaseRawBatchBuffer.close(BaseRawBatchBuffer.java:111) at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:91) at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:71) at org.apache.drill.exec.work.batch.AbstractDataCollector.close(AbstractDataCollector.java:121) at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:91) at org.apache.drill.exec.work.batch.IncomingBuffers.close(IncomingBuffers.java:144) at org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.java:581) at org.apache.drill.exec.ops.FragmentContextImpl.close(FragmentContextImpl.java:567) at org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:417) at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:240) ... 5 common frames omitted Suppressed: java.lang.IllegalStateException: Cleanup before finished. 0 out of 1 streams have finished ... 15 common frames omitted Suppressed: java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: (32768) Allocator(op:1:0:8:UnorderedReceiver) 1000000/32768/32768/10000000000 (res/actual/peak/limit) at org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:519) at org.apache.drill.exec.ops.BaseOperatorContext.close(BaseOperatorContext.java:159) at org.apache.drill.exec.ops.OperatorContextImpl.close(OperatorContextImpl.java:77) at org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.java:581) at org.apache.drill.exec.ops.FragmentContextImpl.close(FragmentContextImpl.java:571) ... 7 common frames omitted Suppressed: java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: (1016640) Allocator(frag:1:0) 30000000/1016640/30016640/90715827882 (res/actual/peak/limit) at org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:519) at org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.java:581) at org.apache.drill.exec.ops.FragmentContextImpl.close(FragmentContextImpl.java:574) ... 7 common frames omitted
Steps to reproduce:
1.Enable unequal join:
alter session set `planner.enable_nljoin_for_scalar_only`=false;
2. Disable join optimization to prevent Drill from flipping sides of join that may break the query execution because the NestedLoopJoin operator that executes unequal joins supports only the left join.
alter session set `planner.enable_join_optimization`=false;
3. Execute join one side which is UNION ALL with DISTINCT:
SELECT * FROM ( ( SELECT DISTINCT log_number FROM dfs.tmp.`tableWithNumber2.parquet` ) UNION ALL ( SELECT DISTINCT log_number FROM dfs.tmp.`tableWithNumber2.parquet` ) ) t1 LEFT JOIN ( SELECT log_number AS server_number FROM dfs.tmp.`tableWithNumber2.parquet` ) t3 ON ( t3.server_number >= t1.log_number )
Parquet file for the reproduce: tableWithNumber2.parquet. It contains 10 000 rows, with a single column of random double values. The file was generated by Drill:
apache drill> select *, sqlTypeOf(log_number) as log_number_column_type from dfs.tmp.`tableWithNumber2.parquet` limit 10; +------------+------------------------+ | log_number | log_number_column_type | +------------+------------------------+ | 4.0 | DOUBLE | | 5.0 | DOUBLE | | 4.0 | DOUBLE | | 3.0 | DOUBLE | | 4.0 | DOUBLE | | 0.0 | DOUBLE | | 0.0 | DOUBLE | | 8.0 | DOUBLE | | 3.0 | DOUBLE | | 4.0 | DOUBLE | +------------+------------------------+ 10 rows selected (0.175 seconds)
Also attaching the profile of the failed query.