Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-63

timeout request stay in queue of drpc server

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.2-incubating
    • Component/s: storm-core
    • Labels:
      None

      Description

      https://github.com/nathanmarz/storm/issues/430

      When a drpc request timeout, it is not removed from the queue of drpc server. This will cause memory leak.

        Activity

        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user xiaokang opened a pull request:

        https://github.com/apache/incubator-storm/pull/33

        STORM-63 remove timeout drpc request from its function's request queue

        It's for jira issue https://issues.apache.org/jira/browse/STORM-63. It prevent a timed-out drpc request wasting drpc server memory and spout/bolt computing resource.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/apache/incubator-storm STORM-63

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/incubator-storm/pull/33.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #33


        commit 0108765688e64fa7c039e9478b303c118882d972
        Author: Kang Xiao <kxiao.tiger@gmail.com>
        Date: 2014-02-16T15:12:54Z

        STORM-63 remove timeout drpc request from its function's request queue


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user xiaokang opened a pull request: https://github.com/apache/incubator-storm/pull/33 STORM-63 remove timeout drpc request from its function's request queue It's for jira issue https://issues.apache.org/jira/browse/STORM-63 . It prevent a timed-out drpc request wasting drpc server memory and spout/bolt computing resource. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-storm STORM-63 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-storm/pull/33.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #33 commit 0108765688e64fa7c039e9478b303c118882d972 Author: Kang Xiao <kxiao.tiger@gmail.com> Date: 2014-02-16T15:12:54Z STORM-63 remove timeout drpc request from its function's request queue
        Hide
        xiaokang Kang Xiao added a comment -
        Show
        xiaokang Kang Xiao added a comment - PR is sent for this issue. https://github.com/apache/incubator-storm/pull/33
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user revans2 commented on a diff in the pull request:

        https://github.com/apache/incubator-storm/pull/33#discussion_r10046165

        — Diff: storm-core/src/clj/backtype/storm/daemon/drpc.clj —
        @@ -59,6 +63,8 @@
        (when-let [sem (@id->sem id)]
        (swap! id->result assoc id (DRPCExecutionException. "Request timed out"))
        (.release sem))
        + (.remove (acquire-queue request-queues (@id->function id)) (@id->request id))
        — End diff –

        Either your indentation is wrong here or you have put the remove in the incorrect place. These are outside of the when-let block but have the indentation of items in it. If sem has been removed then we don't need to remove it from the queue, although it should not hurt anything in that case. And the log line should be OK either way, but preferably in the when-let.

        Show
        githubbot ASF GitHub Bot added a comment - Github user revans2 commented on a diff in the pull request: https://github.com/apache/incubator-storm/pull/33#discussion_r10046165 — Diff: storm-core/src/clj/backtype/storm/daemon/drpc.clj — @@ -59,6 +63,8 @@ (when-let [sem (@id->sem id)] (swap! id->result assoc id (DRPCExecutionException. "Request timed out")) (.release sem)) + (.remove (acquire-queue request-queues (@id->function id)) (@id->request id)) — End diff – Either your indentation is wrong here or you have put the remove in the incorrect place. These are outside of the when-let block but have the indentation of items in it. If sem has been removed then we don't need to remove it from the queue, although it should not hurt anything in that case. And the log line should be OK either way, but preferably in the when-let.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user revans2 commented on a diff in the pull request:

        https://github.com/apache/incubator-storm/pull/33#discussion_r10046618

        — Diff: storm-core/src/clj/backtype/storm/daemon/drpc.clj —
        @@ -59,6 +63,8 @@
        (when-let [sem (@id->sem id)]
        (swap! id->result assoc id (DRPCExecutionException. "Request timed out"))
        (.release sem))
        + (.remove (acquire-queue request-queues (@id->function id)) (@id->request id))
        + (log-warn "Timeout DRPC request id: " id " start at " start)
        (cleanup id)
        — End diff –

        I think there is a race condition here, although you didn't cause it. Between releasing the sem, cleaning up the atoms and fetching the result. It looks like it is possible on a timeout that the sem will be released but by the time the thrift thread actually wakes up the result may have been removed, causing a null to be returned instead of an exception being thrown.

        I am mostly curious if you have seen this happen in practice, because you are fixing issues related to timeouts. Either way I think I will file a JIRA for this.

        Show
        githubbot ASF GitHub Bot added a comment - Github user revans2 commented on a diff in the pull request: https://github.com/apache/incubator-storm/pull/33#discussion_r10046618 — Diff: storm-core/src/clj/backtype/storm/daemon/drpc.clj — @@ -59,6 +63,8 @@ (when-let [sem (@id->sem id)] (swap! id->result assoc id (DRPCExecutionException. "Request timed out")) (.release sem)) + (.remove (acquire-queue request-queues (@id->function id)) (@id->request id)) + (log-warn "Timeout DRPC request id: " id " start at " start) (cleanup id) — End diff – I think there is a race condition here, although you didn't cause it. Between releasing the sem, cleaning up the atoms and fetching the result. It looks like it is possible on a timeout that the sem will be released but by the time the thrift thread actually wakes up the result may have been removed, causing a null to be returned instead of an exception being thrown. I am mostly curious if you have seen this happen in practice, because you are fixing issues related to timeouts. Either way I think I will file a JIRA for this.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user xiaokang commented on a diff in the pull request:

        https://github.com/apache/incubator-storm/pull/33#discussion_r10388051

        — Diff: storm-core/src/clj/backtype/storm/daemon/drpc.clj —
        @@ -59,6 +63,8 @@
        (when-let [sem (@id->sem id)]
        (swap! id->result assoc id (DRPCExecutionException. "Request timed out"))
        (.release sem))
        + (.remove (acquire-queue request-queues (@id->function id)) (@id->request id))
        — End diff –

        HI @revans2, it's really a mistake to put the remove and log code outside of when-let block. I'll fix it.

        Show
        githubbot ASF GitHub Bot added a comment - Github user xiaokang commented on a diff in the pull request: https://github.com/apache/incubator-storm/pull/33#discussion_r10388051 — Diff: storm-core/src/clj/backtype/storm/daemon/drpc.clj — @@ -59,6 +63,8 @@ (when-let [sem (@id->sem id)] (swap! id->result assoc id (DRPCExecutionException. "Request timed out")) (.release sem)) + (.remove (acquire-queue request-queues (@id->function id)) (@id->request id)) — End diff – HI @revans2, it's really a mistake to put the remove and log code outside of when-let block. I'll fix it.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user xiaokang commented on a diff in the pull request:

        https://github.com/apache/incubator-storm/pull/33#discussion_r10388407

        — Diff: storm-core/src/clj/backtype/storm/daemon/drpc.clj —
        @@ -59,6 +63,8 @@
        (when-let [sem (@id->sem id)]
        (swap! id->result assoc id (DRPCExecutionException. "Request timed out"))
        (.release sem))
        + (.remove (acquire-queue request-queues (@id->function id)) (@id->request id))
        + (log-warn "Timeout DRPC request id: " id " start at " start)
        (cleanup id)
        — End diff –

        Yes, we have seen this happen in practice. In fact it's much possible to to return null instead of throw an exception when drpc request is timeout. For example a python drpc client will print "execute failed: unknown result".

        Show
        githubbot ASF GitHub Bot added a comment - Github user xiaokang commented on a diff in the pull request: https://github.com/apache/incubator-storm/pull/33#discussion_r10388407 — Diff: storm-core/src/clj/backtype/storm/daemon/drpc.clj — @@ -59,6 +63,8 @@ (when-let [sem (@id->sem id)] (swap! id->result assoc id (DRPCExecutionException. "Request timed out")) (.release sem)) + (.remove (acquire-queue request-queues (@id->function id)) (@id->request id)) + (log-warn "Timeout DRPC request id: " id " start at " start) (cleanup id) — End diff – Yes, we have seen this happen in practice. In fact it's much possible to to return null instead of throw an exception when drpc request is timeout. For example a python drpc client will print "execute failed: unknown result".
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user revans2 commented on the pull request:

        https://github.com/apache/incubator-storm/pull/33#issuecomment-42333682

        Sorry it took me so long to get back to you on this change. I am +1

        Show
        githubbot ASF GitHub Bot added a comment - Github user revans2 commented on the pull request: https://github.com/apache/incubator-storm/pull/33#issuecomment-42333682 Sorry it took me so long to get back to you on this change. I am +1
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user xiaokang commented on the pull request:

        https://github.com/apache/incubator-storm/pull/33#issuecomment-42909098

        Thanks @revans2 for your comments.

        Show
        githubbot ASF GitHub Bot added a comment - Github user xiaokang commented on the pull request: https://github.com/apache/incubator-storm/pull/33#issuecomment-42909098 Thanks @revans2 for your comments.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user d2r commented on the pull request:

        https://github.com/apache/incubator-storm/pull/33#issuecomment-43140198

        @xiaokang, Looks good. I wrote a quick unit test for this, and sent a [pull request](https://github.com/xiaokang/incubator-storm/pull/1) to you. Could you take a look?

        If not, I am happy to add the test sometime later.

        Show
        githubbot ASF GitHub Bot added a comment - Github user d2r commented on the pull request: https://github.com/apache/incubator-storm/pull/33#issuecomment-43140198 @xiaokang, Looks good. I wrote a quick unit test for this, and sent a [pull request] ( https://github.com/xiaokang/incubator-storm/pull/1 ) to you. Could you take a look? If not, I am happy to add the test sometime later.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user d2r commented on the pull request:

        https://github.com/apache/incubator-storm/pull/33#issuecomment-44691802

        I am +1. I will add a unit test in a separate JIRA. We should fix the leak now.

        Show
        githubbot ASF GitHub Bot added a comment - Github user d2r commented on the pull request: https://github.com/apache/incubator-storm/pull/33#issuecomment-44691802 I am +1. I will add a unit test in a separate JIRA. We should fix the leak now.
        Show
        githubbot ASF GitHub Bot added a comment - Github user d2r commented on the pull request: https://github.com/apache/incubator-storm/pull/33#issuecomment-44841346 See https://issues.apache.org/jira/browse/STORM-335
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/incubator-storm/pull/33

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/incubator-storm/pull/33
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user revans2 commented on the pull request:

        https://github.com/apache/incubator-storm/pull/33#issuecomment-45356400

        @xiaokang Thanks, I merged this into master.

        Show
        githubbot ASF GitHub Bot added a comment - Github user revans2 commented on the pull request: https://github.com/apache/incubator-storm/pull/33#issuecomment-45356400 @xiaokang Thanks, I merged this into master.

          People

          • Assignee:
            xiaokang Kang Xiao
            Reporter:
            xumingming James Xu
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development