Spark thrift server will throw an exception when SASL encryption is used.
18/04/16 14:36:46 ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() on RPC id 8384069538832556183 java.lang.IllegalArgumentException: A secret key must be specified via the spark.authenticate.secret config at org.apache.spark.SecurityManager$$anonfun$getSecretKey$4.apply(SecurityManager.scala:510) at org.apache.spark.SecurityManager$$anonfun$getSecretKey$4.apply(SecurityManager.scala:510) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.SecurityManager.getSecretKey(SecurityManager.scala:509) at org.apache.spark.SecurityManager.getSecretKey(SecurityManager.scala:551) at org.apache.spark.network.sasl.SparkSaslServer$DigestCallbackHandler.handle(SparkSaslServer.java:166) at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589) at com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) at org.apache.spark.network.sasl.SparkSaslServer.response(SparkSaslServer.java:119) at org.apache.spark.network.sasl.SaslRpcHandler.receive(SaslRpcHandler.java:103) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:187) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:111)
To investigate it, the issue is:
Spark on Yarn stores SASL secret in current UGI's credentials, this credentials will be distributed to AM and executors, so that executors and drive share the same secret to communicate. But STS/Hive library code will refresh the current UGI by UGI's loginFromKeytab(), this will create a new UGI in the current context with empty tokens and secret keys, so secret key is lost in the current context's UGI, that's why Spark driver throws secret key not found exception.
In Spark 2.2 code, Spark also stores this secret key in SecurityManager's class variable, so even UGI is refreshed, the secret is still existed in the object, so STS with SASL can still be worked in Spark 2.2. But in Spark 2.3, we always search key from current UGI, which makes it fail to work in Spark 2.3.
To fix this issue, there're two possible solutions:
1. Fix in STS/Hive library, when a new UGI is refreshed, copy the secret key from original UGI to the new one. The difficulty is that some codes to refresh the UGI is existed in Hive library, which makes us hard to change the code.
2. Roll back the logics in SecurityManager to match Spark 2.2, so that this issue can be fixed.
2nd solution seems a simple one. So I will propose a PR with 2nd solution.