Camel
  1. Camel
  2. CAMEL-5058

Bug: Unique Endpoints Leaking in DefaultInflightRepository

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.9.0
    • Fix Version/s: 2.9.2, 2.10.0
    • Component/s: camel-core
    • Labels:
      None
    • Estimated Complexity:
      Unknown

      Description

      If you have an endpoint protocol which uses unique URIs you will leak Strings in the HashMap stored in the DefaultInflightRepository (org.apache.camel.impl.DefaultInflightRepository)

      It seems there is a reference counting scheme in place, but it doesn't do a remove until the "stop" method is called to shut the system down. We are running XMPP endpoints, which use a protocol like xmpp://someaccount@domain/password?to=someOtherAccount
      When there are 10 million accounts, not all of which are active, but all of which may message at some time or another, no references are removed to the endpointCount.

      When the count becomes 0, the reference should be removed and the size method will still return the appropriate result.

      Please be careful in the implementation to synchronize on some object (perhaps the AtomicInteger) reflecting a read/write lock on the endpoint count modification.

      1. fixLeak.diff
        6 kB
        Zach Calvert
      2. fixLeak.diff
        6 kB
        Zach Calvert

        Activity

        Hide
        Claus Ibsen added a comment -

        The inflight registry should be route based instead of endpoint, as that is what the graceful shutdown really needs.
        So instead of being endpoint based, we can switch to be route id, based instead.

        Can you post some more details about your use-case, I wonder why you get so many unique endpoints in the registry, as its consumer based. Are you adding a lot of new routes or the likes?

        Show
        Claus Ibsen added a comment - The inflight registry should be route based instead of endpoint, as that is what the graceful shutdown really needs. So instead of being endpoint based, we can switch to be route id, based instead. Can you post some more details about your use-case, I wonder why you get so many unique endpoints in the registry, as its consumer based. Are you adding a lot of new routes or the likes?
        Hide
        Claus Ibsen added a comment -

        I have committed a fix that removes entries from the registry when removing a route.

        We need to keep it endpoint key based due producer template setup a UoW which essentially is a exchange created outside a route, and therefore we do not have a routeId at this given moment.

        Show
        Claus Ibsen added a comment - I have committed a fix that removes entries from the registry when removing a route. We need to keep it endpoint key based due producer template setup a UoW which essentially is a exchange created outside a route, and therefore we do not have a routeId at this given moment.
        Hide
        Babak Vahdat added a comment -
        Show
        Babak Vahdat added a comment - Just for the sake of not losing the revision history: trunk: http://svn.apache.org/viewvc?view=revision&revision=1296653 2.9.x: http://svn.apache.org/viewvc?view=revision&revision=1296655
        Hide
        Claus Ibsen added a comment -

        Ah I was 1000 numbers behind in the commit message

        Show
        Claus Ibsen added a comment - Ah I was 1000 numbers behind in the commit message
        Hide
        Zach Calvert added a comment -

        We construct endpoints at runtime to send a message to the XMPP subsystem. It isn't ideal, but it works for us. With the memory leak, however, it has to be restarted once a week or so.

        Thanks for the fast fix!

        Show
        Zach Calvert added a comment - We construct endpoints at runtime to send a message to the XMPP subsystem. It isn't ideal, but it works for us. With the memory leak, however, it has to be restarted once a week or so. Thanks for the fast fix!
        Hide
        Zach Calvert added a comment -

        I'm not quite sure this fixes our problem. We aren't creating a true route. We are using a ProducerTemplate to send a Message to an Endpoint, and the Endpoint is unique, created at runtime, and would essentially create a new entry in the ConcurrentHashMap stored in the DefaultInflightRepository.

        Ideally, when remove is called with an Exchange, if the reference count becomes 0, then the entry is removed from the Map.

        Show
        Zach Calvert added a comment - I'm not quite sure this fixes our problem. We aren't creating a true route. We are using a ProducerTemplate to send a Message to an Endpoint, and the Endpoint is unique, created at runtime, and would essentially create a new entry in the ConcurrentHashMap stored in the DefaultInflightRepository. Ideally, when remove is called with an Exchange, if the reference count becomes 0, then the entry is removed from the Map.
        Hide
        Claus Ibsen added a comment -

        Zach,

        Yes I suspected there was still a problem for you. But I didn't have a chance to ask how you send the messages. So just to be sure, you use a producer template to send to dynamic endpoint uris.

        Show
        Claus Ibsen added a comment - Zach, Yes I suspected there was still a problem for you. But I didn't have a chance to ask how you send the messages. So just to be sure, you use a producer template to send to dynamic endpoint uris.
        Hide
        Claus Ibsen added a comment -

        Notice ideally the inflight registry would just track for routes only. But the producer template uses an UoW as well for some reason we had to do a while ago. So it will also enlist itself in the inflight registry (well its in fact also an inflight message you may say).

        So we can either ensure to remove from the inflight map if it become zero as you suggest. This may require a bit of locking, as you need to remove and compare.

        Alternative, as the map, does not need to be removed in real time, we can have a background thread, doing the cleanup in a scheduled fashion.

        Show
        Claus Ibsen added a comment - Notice ideally the inflight registry would just track for routes only. But the producer template uses an UoW as well for some reason we had to do a while ago. So it will also enlist itself in the inflight registry (well its in fact also an inflight message you may say). So we can either ensure to remove from the inflight map if it become zero as you suggest. This may require a bit of locking, as you need to remove and compare. Alternative, as the map, does not need to be removed in real time, we can have a background thread, doing the cleanup in a scheduled fashion.
        Hide
        Zach Calvert added a comment -

        Claus, that is correct that I'm using the producer template to send the messages. I actually have a solution that I'm currently testing, but I've got to wait for our legal department to review the change to submit back to the community (we've got a big process I have to follow).

        Hopefully I will be able to submit the patch, including a unit test, to validate the change.

        Show
        Zach Calvert added a comment - Claus, that is correct that I'm using the producer template to send the messages. I actually have a solution that I'm currently testing, but I've got to wait for our legal department to review the change to submit back to the community (we've got a big process I have to follow). Hopefully I will be able to submit the patch, including a unit test, to validate the change.
        Hide
        Zach Calvert added a comment -

        The diff to fix the leak. Note the unit test runs 2,000,000 operations to validate the fix. It runs in 1 second on my computer, so the bottleneck due to the synchronized is limited.

        Show
        Zach Calvert added a comment - The diff to fix the leak. Note the unit test runs 2,000,000 operations to validate the fix. It runs in 1 second on my computer, so the bottleneck due to the synchronized is limited.
        Hide
        Claus Ibsen added a comment -

        Zach thanks for the patch. Can you re-attach the patch and make sure the mark [x] in grant license to ASF in the file attachment wizard. We need this to be able to accept your patch.

        Show
        Claus Ibsen added a comment - Zach thanks for the patch. Can you re-attach the patch and make sure the mark [x] in grant license to ASF in the file attachment wizard. We need this to be able to accept your patch.
        Hide
        Zach Calvert added a comment -

        Re-attaching code with agreement to allow the Apache License

        Show
        Zach Calvert added a comment - Re-attaching code with agreement to allow the Apache License
        Hide
        Zach Calvert added a comment -

        Done

        Show
        Zach Calvert added a comment - Done
        Hide
        Claus Ibsen added a comment -

        Thanks for the patch.

        Show
        Claus Ibsen added a comment - Thanks for the patch.
        Hide
        Babak Vahdat added a comment -

        I did polish the provided unit-test a bit mostly because of the required visibility of the failed flag (volatile keyword).
        Especially required while running tests on the multi-core boxes.

        Show
        Babak Vahdat added a comment - I did polish the provided unit-test a bit mostly because of the required visibility of the failed flag (volatile keyword). Especially required while running tests on the multi-core boxes.
        Hide
        Zach Calvert added a comment -

        Yea, my bad. I remembered I had some cleanup to do on that hack-n-slash unit test last night. Ugly, ugly unit test...

        Thanks for the cleanup Babak!

        Show
        Zach Calvert added a comment - Yea, my bad. I remembered I had some cleanup to do on that hack-n-slash unit test last night. Ugly, ugly unit test... Thanks for the cleanup Babak!
        Hide
        Claus Ibsen added a comment -

        I improved this to avoid the synchronization in CAMEL-5057. As well only to track in flight per routes, as per endpoint is not needed.

        Show
        Claus Ibsen added a comment - I improved this to avoid the synchronization in CAMEL-5057 . As well only to track in flight per routes, as per endpoint is not needed.

          People

          • Assignee:
            Claus Ibsen
            Reporter:
            Zach Calvert
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 24h
              24h
              Remaining:
              Remaining Estimate - 24h
              24h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development