Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16711

YarnShuffleService doesn't re-init properly on YARN rolling upgrade

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.2
    • Fix Version/s: 2.0.1, 2.1.0
    • Component/s: Shuffle, YARN
    • Labels:
      None

      Description

      When a yarn rolling upgrade happens the Spark YarnShuffleService isn't re-initializing the tokens soon enough which causes running applications to fail with NullPointerExceptions rather then IOExceptions which causes clients to not retry which in turn causes the application to totally fail when it should have just retried and succeeded.

      2016-07-22 23:22:05,460 [shuffle-server-1] ERROR server.TransportRequestHandler: Error while invoking RpcHandler#receive() on RPC id 6235606084052282795
      java.lang.NullPointerException: Password cannot be null if SASL is enabled
      at org.spark-project.guava.base.Preconditions.checkNotNull(Preconditions.java:208)
      at org.apache.spark.network.sasl.SparkSaslServer.encodePassword(SparkSaslServer.java:196)
      at org.apache.spark.network.sasl.SparkSaslServer$DigestCallbackHandler.handle(SparkSaslServer.java:166)
      at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
      at com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
      at org.apache.spark.network.sasl.SparkSaslServer.response(SparkSaslServer.java:119)
      at org.apache.spark.network.sasl.SaslRpcHandler.receive(SaslRpcHandler.java:101)
      at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:149)
      at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)
      at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
      at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
      at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
      at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
      at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
      at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
      at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
      at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
      at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
      at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
      at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
      at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
      at java.lang.Thread.run(Thread.java:745)

        Attachments

          Activity

            People

            • Assignee:
              tgraves Thomas Graves
              Reporter:
              tgraves Thomas Graves
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: