Uploaded image for project: 'MINA'
  1. MINA
  2. DIRMINA-1027

SSLHandler writes corrupt messages under heavy load

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.0.11
    • Fix Version/s: 2.0.13
    • Component/s: SSL
    • Labels:
      None

      Description

      I’m facing a critical problem in my project with an MINA-stack including SSL. My Protocol-IoFilterAdapter receives corrupt messages under heavy load (JMeter-Benchmark with 300 Threads and round about 5-10 MINA-calls per thread). The Problem disappeared without SSL, so I debugged for some days, and I think the problem occurred because of the changes made for fixing the issue DIRMINA-1019 (commit). I think the problem is that the SSLHandler is calling messageReceived while also writing data into the next filter, which is writing inside the messageReceivedEventQueue. With the fix provided inside the race-condition-ticket, it works like a charm. So I think that the answer of the question "Why the second loop is also protected by the loop?" is: because otherwise the messageReceivedEventQueue gets corrupted data...

      (I hope I’m not wrong and anybody can follow my thoughts and maybe bad English)

        Activity

        Hide
        elecharny Emmanuel Lecharny added a comment -

        Hi,

        I'm not sure I get it. Is the patch fixing your issue, or do you still have your corruption issue ? Your sentence "With the fix provided inside the race-condition-ticket, it works like a charm" seems to mean that the fix is correct...

        Show
        elecharny Emmanuel Lecharny added a comment - Hi, I'm not sure I get it. Is the patch fixing your issue, or do you still have your corruption issue ? Your sentence "With the fix provided inside the race-condition-ticket, it works like a charm" seems to mean that the fix is correct...
        Hide
        Coulton Michael Kohlsche added a comment -

        Sorry for been not clear enough. The provided fix from Terence Marks works for me. Not the code in the current release or from the trunk.

        Show
        Coulton Michael Kohlsche added a comment - Sorry for been not clear enough. The provided fix from Terence Marks works for me. Not the code in the current release or from the trunk.
        Hide
        elecharny Emmanuel Lecharny added a comment - - edited

        I think I have applied the patch in the 2.0 branch, so the next release (and even 2.0.11) should be ok. Can you confirm that ? (or test the 2.0.12 whichis available on people.apache.org/~elecharny)

        Show
        elecharny Emmanuel Lecharny added a comment - - edited I think I have applied the patch in the 2.0 branch, so the next release (and even 2.0.11) should be ok. Can you confirm that ? (or test the 2.0.12 whichis available on people.apache.org/~elecharny)
        Hide
        Coulton Michael Kohlsche added a comment - - edited

        I’m sorry again for been not clear enough. The problem occurs in the releases 2.0.9, 2.0.11 and with my own build from the 2.0-branch. The (in my opinion) problematic method looks the same in all these builds:

        SslHandler.java
            /* no qualifier */void flushScheduledEvents() {
                // Fire events only when the lock is available for this handler.
                IoFilterEvent event;
                try {
                    sslLock.lock();
        
                    // We need synchronization here inevitably because filterWrite can be
                    // called simultaneously and cause 'bad record MAC' integrity error.
                    while ((event = filterWriteEventQueue.poll()) != null) {
                        NextFilter nextFilter = event.getNextFilter();
                        nextFilter.filterWrite(session, (WriteRequest) event.getParameter());
                    }
                } finally {
                    sslLock.unlock();
                }
        
                while ((event = messageReceivedEventQueue.poll()) != null) {
                    NextFilter nextFilter = event.getNextFilter();
                    nextFilter.messageReceived(session, event.getParameter());
                }
            }
        

        After changing this part to something like Terence Marks provided in his fix.java inside the ticket DIRMINA-1019 the problem disappeared. My working solution looks like this:

        SslHandler.java
          private final AtomicInteger scheduled_events = new AtomicInteger(0);
        ...
        
          /* no qualifier */void flushScheduledEvents() {
        
            scheduled_events.incrementAndGet();
        
            // Fire events only when the lock is available for this handler.
            if (sslLock.tryLock()) {
        
              IoFilterEvent event;
        
              // We need synchronization here inevitably because filterWrite can be
              // called simultaneously and cause 'bad record MAC' integrity error.
              try {
                do {
                  while ((event = filterWriteEventQueue.poll()) != null) {
                    final NextFilter nextFilter = event.getNextFilter();
                    nextFilter.filterWrite(session, (WriteRequest)event.getParameter());
                  }
        
                  while ((event = messageReceivedEventQueue.poll()) != null) {
                    final NextFilter nextFilter = event.getNextFilter();
                    nextFilter.messageReceived(session, event.getParameter());
                  }
                }
                while (scheduled_events.decrementAndGet() > 0);
              }
              finally {
                sslLock.unlock();
              }
            }
          }
        

        I’m not sure if this fixes the real problem or if it just hides the symptoms.

        Show
        Coulton Michael Kohlsche added a comment - - edited I’m sorry again for been not clear enough. The problem occurs in the releases 2.0.9, 2.0.11 and with my own build from the 2.0-branch. The (in my opinion) problematic method looks the same in all these builds: SslHandler.java /* no qualifier */void flushScheduledEvents() { // Fire events only when the lock is available for this handler. IoFilterEvent event; try { sslLock.lock(); // We need synchronization here inevitably because filterWrite can be // called simultaneously and cause 'bad record MAC' integrity error. while ((event = filterWriteEventQueue.poll()) != null ) { NextFilter nextFilter = event.getNextFilter(); nextFilter.filterWrite(session, (WriteRequest) event.getParameter()); } } finally { sslLock.unlock(); } while ((event = messageReceivedEventQueue.poll()) != null ) { NextFilter nextFilter = event.getNextFilter(); nextFilter.messageReceived(session, event.getParameter()); } } After changing this part to something like Terence Marks provided in his fix.java inside the ticket DIRMINA-1019 the problem disappeared. My working solution looks like this: SslHandler.java private final AtomicInteger scheduled_events = new AtomicInteger(0); ... /* no qualifier */void flushScheduledEvents() { scheduled_events.incrementAndGet(); // Fire events only when the lock is available for this handler. if (sslLock.tryLock()) { IoFilterEvent event; // We need synchronization here inevitably because filterWrite can be // called simultaneously and cause 'bad record MAC' integrity error. try { do { while ((event = filterWriteEventQueue.poll()) != null ) { final NextFilter nextFilter = event.getNextFilter(); nextFilter.filterWrite(session, (WriteRequest)event.getParameter()); } while ((event = messageReceivedEventQueue.poll()) != null ) { final NextFilter nextFilter = event.getNextFilter(); nextFilter.messageReceived(session, event.getParameter()); } } while (scheduled_events.decrementAndGet() > 0); } finally { sslLock.unlock(); } } } I’m not sure if this fixes the real problem or if it just hides the symptoms.
        Hide
        elecharny Emmanuel Lecharny added a comment -

        Ah, ok, it's clearer now !

        Yeah, I assumed it was not necessary to wrap the messageReceived part in the protected loop. It was probably too much an assumption.

        Let me check that.

        Show
        elecharny Emmanuel Lecharny added a comment - Ah, ok, it's clearer now ! Yeah, I assumed it was not necessary to wrap the messageReceived part in the protected loop. It was probably too much an assumption. Let me check that.
        Hide
        elecharny Emmanuel Lecharny added a comment - - edited

        I have applied the suggested patch fully. Can you give it a try using the packages I have created and posted on http://people.apache.org/~elecharny ?

        Show
        elecharny Emmanuel Lecharny added a comment - - edited I have applied the suggested patch fully. Can you give it a try using the packages I have created and posted on http://people.apache.org/~elecharny ?
        Hide
        Coulton Michael Kohlsche added a comment -

        Thx for the test-build. I will give it a try as soon as possible.

        Show
        Coulton Michael Kohlsche added a comment - Thx for the test-build. I will give it a try as soon as possible.
        Show
        elecharny Emmanuel Lecharny added a comment - Should be fixed with http://git-wip-us.apache.org/repos/asf/mina/commit/1d437477

          People

          • Assignee:
            Unassigned
            Reporter:
            Coulton Michael Kohlsche
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development