Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7156

NullPointerException when reaching max shuffle connections

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      When you hit the max number of shuffle connections, you can get a lot of NullPointerExceptions from Netty:

      2018-07-17 10:47:36,311 INFO org.apache.hadoop.mapred.ShuffleHandler: Current number of shuffle connections (360) is greater than or equal to the max allowed shuffle connections (360)
      2018-07-17 10:47:36,311 INFO org.apache.hadoop.mapred.ShuffleHandler: Current number of shuffle connections (360) is greater than or equal to the max allowed shuffle connections (360)
      2018-07-17 10:47:36,312 INFO org.apache.hadoop.mapred.ShuffleHandler: Current number of shuffle connections (360) is greater than or equal to the max allowed shuffle connections (360)
      2018-07-17 10:47:36,316 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error:
      java.lang.NullPointerException
      2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] EXCEPTION: java.lang.NullPointerException
      2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error:
      java.lang.NullPointerException
      2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] EXCEPTION: java.lang.NullPointerException
      2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error:
      java.lang.NullPointerException
      2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] EXCEPTION: java.lang.NullPointerException
      2018-07-17 10:47:36,329 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Skipping monitoring container container_e22_1531424278071_55040_01_002295 since CPU usage is not yet available.
      2018-07-17 10:47:36,340 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error:
      java.lang.NullPointerException
      2018-07-17 10:47:36,340 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] EXCEPTION: java.lang.NullPointerException
      2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error:
      java.lang.NullPointerException
      2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] EXCEPTION: java.lang.NullPointerException
      2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error:
      java.lang.NullPointerException
      2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] EXCEPTION: java.lang.NullPointerException
      2018-07-17 10:47:36,361 INFO org.apache.hadoop.mapred.ShuffleHandler: Current number of shuffle connections (360) is greater than or equal to the max allowed shuffle connections (360)
      2018-07-17 10:47:36,390 INFO org.apache.hadoop.mapred.ShuffleHandler: Current number of shuffle connections (360) is greater than or equal to the max allowed shuffle connections (360)
      2018-07-17 10:47:36,395 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error:
      
      2018-07-17 13:58:28,263 INFO org.apache.hadoop.mapred.ShuffleHandler: Current number of shuffle connections (360) is greater than or equal to the max allowed shuffle connections (360)
      2018-07-17 13:58:28,264 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error:
      java.lang.NullPointerException
              at org.jboss.netty.handler.timeout.IdleStateHandler.writeComplete(IdleStateHandler.java:302)
              at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73)
              at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
              at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
              at org.jboss.netty.channel.SimpleChannelUpstreamHandler.writeComplete(SimpleChannelUpstreamHandler.java:233)
              at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73)
              at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
              at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
              at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
              at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
              at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
              at org.jboss.netty.channel.SimpleChannelUpstreamHandler.writeComplete(SimpleChannelUpstreamHandler.java:233)
              at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73)
              at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
              at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
              at org.jboss.netty.channel.SimpleChannelUpstreamHandler.writeComplete(SimpleChannelUpstreamHandler.java:233)
              at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73)
              at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
              at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
              at org.jboss.netty.channel.SimpleChannelUpstreamHandler.writeComplete(SimpleChannelUpstreamHandler.java:233)
              at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73)
              at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
              at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
              at org.jboss.netty.channel.Channels.fireWriteComplete(Channels.java:324)
              at org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:299)
              at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:146)
              at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:99)
              at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:36)
              at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)
              at org.jboss.netty.channel.Channels.write(Channels.java:725)
              at org.jboss.netty.channel.Channels.write(Channels.java:686)
              at org.jboss.netty.handler.ssl.SslHandler.wrapNonAppData(SslHandler.java:1110)
              at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1252)
              at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:852)
              at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
              at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
              at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
              at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
              at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
              at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
              at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
              at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
              at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
              at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
              at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
              at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
              at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
              at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      

      Solutions seems to be an one-liner: you have to call super.channelOpen(ctx, evt); in Shuffle.channelOpen() in both cases. If we don't do this, then IdleStateHandler will not be initialized properly and will get a null attachment object when executing writeComplete().

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            pbacsko Peter Bacsko Assign to me
            pbacsko Peter Bacsko
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment