Description
It is possible for a Java SASL/OAUTHBEARER client (either a non-broker producer/consumer client or a broker when acting as an inter-broker client) to end up in a state where it cannot connect to a new broker (or, if re-authentication as implemented by KIP-368 and merged for v2.2.0 were to be deployed and enabled, to be unable to re-authenticate). The error message looks like this:
Connection to node 1 failed authentication due to: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: Unable to find OAuth Bearer token in Subject's private credentials (size=2) [Caused by java.io.IOException: Unable to find OAuth Bearer token in Subject's private credentials (size=2)]) occurred when evaluating SASL token received from the Kafka Broker. Kafka Client will go to AUTHENTICATION_FAILED state.
The root cause of the problem begins at this point in the code:
The loginContext field doesn't get replaced with the old version stored away in the optionalLoginContextToLogout variable if/when the loginContext.login() call on line 381 throws an exception. This is an unusual event – the OAuth authorization server must be unavailable at the moment when the token refresh occurs – but when it does happen it puts the refresher thread instance in an invalid state because now its loginContext field represents the one that failed instead of the original one, which is now lost. The current loginContext can't be logged out – it will throw an InvalidStateException if that is attempted because there is no token associated with it – and the token associated with the login context that was lost can never be logged out and removed from the Subject's private credentials (because we don't retain a reference to it). The net effect is that we end up with an extra token on the Subject's private credentials, which eventually results in the exception mentioned above when the client tries to authenticate to a broker.
So the chain of events is:
1) login failure upon token refresh causes the refresher thread's login context field to be incorrect, and the existing token on the Subject's private credentials will never be logged out/removed
2) retry occurs in 10 seconds, potentially repeatedly until the authorization server is back online
3) login succeeds, adding a second token to the Subject's private credentials (logout is then called on the login context set incorrectly in the most recent failure – e.g. in step 1 – which results in an exception, but this is not the real issue – it is the 2 tokens on the Subject's private credentials that is the issue)
4) At this point we now have 2 tokens on the Subject, and then at some point in the future the client tries to make a new connection, it sees the 2 tokens and throws an exception – BOOM! The client is now unable to connect (or re-authenticate if applicable) going forward.
Attachments
Issue Links
- links to