Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.10.0
-
None
-
None
Description
[Git Commit ID will be updated soon]
The below query which uses the managed sort causes an OOM error due to insufficient heap, which is a bug in itself.
ALTER SESSION SET `exec.sort.disable_managed` = false; +-------+-------------------------------------+ | ok | summary | +-------+-------------------------------------+ | true | exec.sort.disable_managed updated. | +-------+-------------------------------------+ 1 row selected (1.096 seconds) 0: jdbc:drill:zk=10.10.100.183:5181> alter session set `planner.memory.max_query_memory_per_node` = 14106127360; +-------+----------------------------------------------------+ | ok | summary | +-------+----------------------------------------------------+ | true | planner.memory.max_query_memory_per_node updated. | +-------+----------------------------------------------------+ 1 row selected (0.253 seconds) 0: jdbc:drill:zk=10.10.100.183:5181> alter session set `planner.width.max_per_node` = 1; +-------+--------------------------------------+ | ok | summary | +-------+--------------------------------------+ | true | planner.width.max_per_node updated. | +-------+--------------------------------------+ 1 row selected (0.184 seconds) 0: jdbc:drill:zk=10.10.100.183:5181> select * from (select * from dfs.`/drill/testdata/resource-manager/250wide.tbl` order by columns[0])d where d.columns[0] = 'ljdfhwuehnoiueyf';
Once the OOM happens chaos follows
1. Dangling fragments are left behind 2. Query fails but zookeeper thinks its still running 3. Client connection timeouts 4. Profile page shows the same query as both running and failed.
We should be handling this situation more gracefully as this could be perceived as a drillbit stability issue. I attached the jstack. The logs and data set used are too big to upload here. Reach out to me if you need more information.