Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3665

Deadlock while executing CTAS that runs out of memory

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Duplicate
    • 1.2.0
    • 1.11.0
    • Execution - Flow
    • None

    Description

      I had a query running out of memory during CTAS and after that drillbit was rendered unusable:

      0: jdbc:drill:schema=dfs> create table lineitem as select
      . . . . . . . . . . . . >     cast(columns[0] as int) l_orderkey,
      . . . . . . . . . . . . >     cast(columns[1] as int) l_partkey,
      . . . . . . . . . . . . >     cast(columns[2] as int) l_suppkey,
      . . . . . . . . . . . . >     cast(columns[3] as int) l_linenumber,
      . . . . . . . . . . . . >     cast(columns[4] as double) l_quantity,
      . . . . . . . . . . . . >     cast(columns[5] as double) l_extendedprice,
      . . . . . . . . . . . . >     cast(columns[6] as double) l_discount,
      . . . . . . . . . . . . >     cast(columns[7] as double) l_tax,
      . . . . . . . . . . . . >     cast(columns[8] as varchar(200)) l_returnflag,
      . . . . . . . . . . . . >     cast(columns[9] as varchar(200)) l_linestatus,
      . . . . . . . . . . . . >     cast(columns[10] as date) l_shipdate,
      . . . . . . . . . . . . >     cast(columns[11] as date) l_commitdate,
      . . . . . . . . . . . . >     cast(columns[12] as date) l_receiptdate,
      . . . . . . . . . . . . >     cast(columns[13] as varchar(200)) l_shipinstruct,
      . . . . . . . . . . . . >     cast(columns[14] as varchar(200)) l_shipmode,
      . . . . . . . . . . . . >     cast(columns[15] as varchar(200)) l_comment
      . . . . . . . . . . . . > from `lineitem.dat`;
      Error: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
      Fragment 1:10
      [Error Id: 11084315-5388-4500-b165-642a5f595ebf on atsqa4-133.qa.lab:31010] (state=,code=0)
      

      Here is drill's behavior after that:

      1. Tried to run: "select * from sys.options" in the same sqlline session - hangs.

      2. Was able to start sqlline and connect to drillbit:

      • If you try running anything on this connection: it hangs.
      • Issue ^C --> you will get result if you are lucky (these queries will appear as: "CANCELLATION_REQUESTED" on WebUI)
        (I only tried querying sys.memory, sys.options which possibly have a different code path than queries from actual user data)
      • If you are not lucky, you will get this error below:
                0: jdbc:drill:schema=dfs> show files;
                java.lang.RuntimeException: java.sql.SQLException: Unexpected RuntimeException: java.lang.IllegalArgumentException: Buffer has negative reference count.
                at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
                at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
                at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
                at sqlline.SqlLine.print(SqlLine.java:1583)
                at sqlline.Commands.execute(Commands.java:852)
                at sqlline.Commands.sql(Commands.java:751)
                at sqlline.SqlLine.dispatch(SqlLine.java:738)
                at sqlline.SqlLine.begin(SqlLine.java:612)
                at sqlline.SqlLine.start(SqlLine.java:366)
                at sqlline.SqlLine.main(SqlLine.java:259)
        

      or maybe something like this:

      0: jdbc:drill:schema=dfs> select count(*) from nation group by n_regionkey;
      Error: CONNECTION ERROR: Exceeded timeout (5000) while waiting send intermediate work fragments to remote nodes. Sent 1 and only heard response back from 0 nodes.
      [Error Id: 6abce8e9-78a1-4b3d-bcec-503930482b40 on atsqa4-133.qa.lab:31010] (state=,code=0)
      

      I'm attaching results of a jstack and drillbit.log and so far I was not able to reproduce this problem again (working on it).

      Attachments

        1. drillbit.log.drill-3665
          793 kB
          Victoria Markman
        2. jstack.txt
          103 kB
          Victoria Markman

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            RomanKulyk Roman Kulyk
            vicky Victoria Markman
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment