Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3206

Memory leak in window functions

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.1.0
    • Component/s: Execution - Flow
    • Labels:
    • Environment:

      21cc578b6b8c8f3ca1ebffd3dbb92e35d68bc726

      Description

      Test was run on 4 node cluster on CentOS.

      Size in bytes of JSON data file.

      [root@centos-01 ~]# hadoop fs -ls /tmp/twoKeyJsn.json
      -rwxr-xr-x   3 root root  888409136 2015-04-20 18:32 /tmp/twoKeyJsn.json
      
      0: jdbc:drill:schema=dfs.tmp> select count(key1) over(partition by key2 order by key1) from `twoKeyJsn.json`;
      java.lang.RuntimeException: java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
      
      Fragment 1:7
      
      [Error Id: 8ffc94b9-1318-4841-9247-259155e97202 on centos-02.qa.lab:31010]
      	at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
      	at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:85)
      	at sqlline.TableOutputFormat.print(TableOutputFormat.java:116)
      	at sqlline.SqlLine.print(SqlLine.java:1583)
      	at sqlline.Commands.execute(Commands.java:852)
      	at sqlline.Commands.sql(Commands.java:751)
      	at sqlline.SqlLine.dispatch(SqlLine.java:738)
      	at sqlline.SqlLine.begin(SqlLine.java:612)
      	at sqlline.SqlLine.start(SqlLine.java:366)
      	at sqlline.SqlLine.main(SqlLine.java:259)
      

      Memory usage after above query was executed

      0: jdbc:drill:schema=dfs.tmp> select * from sys.memory;
      +-------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
      |     hostname      | user_port  | heap_current  |  heap_max   | direct_current  | jvm_direct_current  | direct_max  |
      +-------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
      | centos-01.qa.lab  | 31010      | 1304067160    | 4294967296  | 110019091       | 520095827           | 8589934592  |
      | centos-03.qa.lab  | 31010      | 2020130800    | 4294967296  | 301360965       | 738199649           | 8589934592  |
      | centos-02.qa.lab  | 31010      | 1253034864    | 4294967296  | 156397935       | 553649232           | 8589934592  |
      | centos-04.qa.lab  | 31010      | 385872528     | 4294967296  | 203721765       | 553649246           | 8589934592  |
      +-------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
      4 rows selected (0.134 seconds)
      

      Memory details after rerunning the query, we are leaking memory.

      0: jdbc:drill:schema=dfs.tmp> select count(key1) over(partition by key2 order by key1) from `twoKeyJsn.json`;
      java.lang.RuntimeException: java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
      
      Fragment 1:7
      
      [Error Id: fe56b1ff-02b6-4ded-a317-d753ab211f5b on centos-03.qa.lab:31010]
      	at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
      	at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:85)
      	at sqlline.TableOutputFormat.print(TableOutputFormat.java:116)
      	at sqlline.SqlLine.print(SqlLine.java:1583)
      	at sqlline.Commands.execute(Commands.java:852)
      	at sqlline.Commands.sql(Commands.java:751)
      	at sqlline.SqlLine.dispatch(SqlLine.java:738)
      	at sqlline.SqlLine.begin(SqlLine.java:612)
      	at sqlline.SqlLine.start(SqlLine.java:366)
      	at sqlline.SqlLine.main(SqlLine.java:259)
      0: jdbc:drill:schema=dfs.tmp> select * from sys.memory;
      +-------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
      |     hostname      | user_port  | heap_current  |  heap_max   | direct_current  | jvm_direct_current  | direct_max  |
      +-------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
      | centos-01.qa.lab  | 31010      | 2414546008    | 4294967296  | 438149911       | 905971795           | 8589934592  |
      | centos-02.qa.lab  | 31010      | 1953483632    | 4294967296  | 901110416       | 1442841680          | 8589934592  |
      | centos-03.qa.lab  | 31010      | 297329544     | 4294967296  | 560852951       | 1308624993          | 8589934592  |
      | centos-04.qa.lab  | 31010      | 458157528     | 4294967296  | 740156752       | 1207960670          | 8589934592  |
      +-------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
      4 rows selected (0.118 seconds)
      

      there are 16 distinct partitions (PARTITION BY key2)

      0: jdbc:drill:schema=dfs.tmp> select distinct key2 from `twoKeyJsn.json`;
      +-------+
      | key2  |
      +-------+
      | d     |
      | c     |
      | b     |
      | 1     |
      | a     |
      | 0     |
      | k     |
      | m     |
      | j     |
      | h     |
      | e     |
      | n     |
      | g     |
      | f     |
      | l     |
      | i     |
      +-------+
      16 rows selected (28.967 seconds)
      
      

      Details from drillbit.log

      error_type: SYSTEM
          message: "SYSTEM ERROR: java.lang.IllegalStateException: Failure while closing accountor.  Expected private and shared pools to be set to initial values.  However, one or more were not.  Stats are\n\tzone\tinit\tallocated\tdelta \n\tprivate\t1000000\t0\t1000000 \n\tshared\t9999000000\t9928320966\t70679034.\n\nFragment 1:8\n\n[Error Id: b7b41c03-1122-4fa4-b441-9aa10544a91e on centos-02.qa.lab:31010]"
          exception {
            exception_class: "java.lang.IllegalStateException"
            message: "Failure while closing accountor.  Expected private and shared pools to be set to initial values.  However, one or more were not.  Stats are\n\tzone\tinit\tallocated\tdelta \n\tprivate\t1000000\t0\t1000000 \n\tshared\t9999000000\t9928320966\t70679034."
            stack_trace {
              class_name: "org.apache.drill.exec.memory.AtomicRemainder"
              file_name: "AtomicRemainder.java"
              line_number: 200
              method_name: "close"
              is_native_method: false
            }
      

      Stack trace

      org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
      
      Fragment 1:7
      
      [Error Id: fe56b1ff-02b6-4ded-a317-d753ab211f5b on centos-03.qa.lab:31010]
              at org.apache.drill.exec.work.foreman.QueryManager$1.statusUpdate(QueryManager.java:458) [drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
              at org.apache.drill.exec.rpc.control.WorkEventBus.statusUpdate(WorkEventBus.java:71) [drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
              at org.apache.drill.exec.work.batch.ControlMessageHandler.handle(ControlMessageHandler.java:79) [drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
              at org.apache.drill.exec.rpc.control.ControlServer.handle(ControlServer.java:61) [drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
              at org.apache.drill.exec.rpc.control.ControlServer.handle(ControlServer.java:38) [drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
              at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:61) [drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
              at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:233) [drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
              at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:205) [drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
              at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) [netty-codec-4.0.27.Final.jar:4.0.27.Final]
              at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
              at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
              at io.netty.handler.timeout.ReadTimeoutHandler.channelRead(ReadTimeoutHandler.java:150) [netty-handler-4.0.27.Final.jar:4.0.27.Final]
              at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
              at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
              at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [netty-codec-4.0.27.Final.jar:4.0.27.Final]
              at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
              at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
              at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) [netty-codec-4.0.27.Final.jar:4.0.27.Final]
              at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
              at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
              at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
              at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
              at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
              at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
              at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
              at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
              at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
              at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final]
              at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                adeneche Deneche A. Hakim
                Reporter:
                khfaraaz Khurram Faraaz
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: