Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16711

YarnShuffleService doesn't re-init properly on YARN rolling upgrade

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.5.2
    • 2.0.1, 2.1.0
    • Shuffle, Spark Core, YARN
    • None

    Description

      When a yarn rolling upgrade happens the Spark YarnShuffleService isn't re-initializing the tokens soon enough which causes running applications to fail with NullPointerExceptions rather then IOExceptions which causes clients to not retry which in turn causes the application to totally fail when it should have just retried and succeeded.

      2016-07-22 23:22:05,460 [shuffle-server-1] ERROR server.TransportRequestHandler: Error while invoking RpcHandler#receive() on RPC id 6235606084052282795
      java.lang.NullPointerException: Password cannot be null if SASL is enabled
      at org.spark-project.guava.base.Preconditions.checkNotNull(Preconditions.java:208)
      at org.apache.spark.network.sasl.SparkSaslServer.encodePassword(SparkSaslServer.java:196)
      at org.apache.spark.network.sasl.SparkSaslServer$DigestCallbackHandler.handle(SparkSaslServer.java:166)
      at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
      at com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
      at org.apache.spark.network.sasl.SparkSaslServer.response(SparkSaslServer.java:119)
      at org.apache.spark.network.sasl.SaslRpcHandler.receive(SaslRpcHandler.java:101)
      at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:149)
      at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)
      at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
      at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
      at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
      at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
      at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
      at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
      at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
      at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
      at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
      at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
      at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
      at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
      at java.lang.Thread.run(Thread.java:745)

      Attachments

        Activity

          People

            tgraves Thomas Graves
            tgraves Thomas Graves
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: