Apache Drill
  1. Apache Drill
  2. DRILL-640

Limit operator leaks memory when used with exchanges

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4.0
    • Component/s: None
    • Labels:
      None

      Description

      Limit operator currently invokes kill() when it reaches the desired limit of records. Current semantic of kill() does not stop upstream operators if they are across exchanges. So its the responsibility of the operator to drain out all the record batches and clear the memory, otherwise we will hit memory leaks.

        Activity

        Show
        Mehant Baid added a comment - github link: https://github.com/mehant/incubator-drill/tree/limit
        Hide
        Mehant Baid added a comment -

        Rebased on latest master.

        Show
        Mehant Baid added a comment - Rebased on latest master.
        Hide
        Jacques Nadeau added a comment -

        merged in f2ff2c9

        Show
        Jacques Nadeau added a comment - merged in f2ff2c9
        Hide
        Steven Phillips added a comment -

        This solution seems problematic to me. Won't we end up with queries that continue to run needlessly after the requested records have been returned? On large tables, this could result in a lot of disk and network IO.

        I think what we really need to do is solve the problem of terminating a query. Maybe it should happen like this:

        Limit reaches the end, return IterOutcome.NONE. It does not call incoming.next() to drain incoming buffers.
        After the result has been sent to the client, the foreman informs the leaf fragments that they should stop execution if they haven't already. The foreman will inform all intermediate fragments, and finally close the root fragment.

        Part of closing each of these fragment will be ensuring that any batch in the IncomingBuffers queue get released. And we shouldn't have to worry about any batches that arrive after the fragment has closed, because there will no longer be a socket connection to receive them.

        Show
        Steven Phillips added a comment - This solution seems problematic to me. Won't we end up with queries that continue to run needlessly after the requested records have been returned? On large tables, this could result in a lot of disk and network IO. I think what we really need to do is solve the problem of terminating a query. Maybe it should happen like this: Limit reaches the end, return IterOutcome.NONE. It does not call incoming.next() to drain incoming buffers. After the result has been sent to the client, the foreman informs the leaf fragments that they should stop execution if they haven't already. The foreman will inform all intermediate fragments, and finally close the root fragment. Part of closing each of these fragment will be ensuring that any batch in the IncomingBuffers queue get released. And we shouldn't have to worry about any batches that arrive after the fragment has closed, because there will no longer be a socket connection to receive them.

          People

          • Assignee:
            Mehant Baid
            Reporter:
            Mehant Baid
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development