Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-14727

Socket not closed properly when reading Configurations with BlockReaderRemote

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.9.0, 3.0.0-alpha4
    • Fix Version/s: 2.9.0, 3.0.0-beta1
    • Component/s: conf
    • Labels:
      None

      Description

      This is caught by Cloudera's internal testing over the alpha4 release.

      We got reports that some hosts ran out of FDs. Triaging that, found out both oozie server and Yarn JobHistoryServer have tons of sockets on CLOSE_WAIT state.

      Haibo Chen helped narrow down to a consistent reproduction by simply visiting the JHS web UI, and clicking through a job and its logs.

      I then look at the BlockReaderRemote and related code, and didn't spot any leaks in the implementation. After adding a debug log whenever a Peer is created/closed/in/out PeerCache, it looks like all the CLOSE_WAIT sockets are created from this call stack:

      2017-08-02 13:58:59,901 INFO org.apache.hadoop.hdfs.client.impl.BlockReaderFactory: ____ associated peer NioInetPeer(Socket[addr=/10.17.196.28,port=20002,localport=42512]) with blockreader org.apache.hadoop.hdfs.client.impl.BlockReaderRemote@717ce109
      java.lang.Exception: test
              at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:745)
              at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:385)
              at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:636)
              at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:566)
              at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:749)
              at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:807)
              at java.io.DataInputStream.read(DataInputStream.java:149)
              at com.ctc.wstx.io.StreamBootstrapper.ensureLoaded(StreamBootstrapper.java:482)
              at com.ctc.wstx.io.StreamBootstrapper.resolveStreamEncoding(StreamBootstrapper.java:306)
              at com.ctc.wstx.io.StreamBootstrapper.bootstrapInput(StreamBootstrapper.java:167)
              at com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:573)
              at com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:633)
              at com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:647)
              at com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:366)
              at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2649)
              at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2697)
              at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2662)
              at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2545)
              at org.apache.hadoop.conf.Configuration.get(Configuration.java:1076)
              at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1126)
              at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1344)
              at org.apache.hadoop.mapreduce.counters.Limits.init(Limits.java:45)
              at org.apache.hadoop.mapreduce.counters.Limits.reset(Limits.java:130)
              at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.loadFullHistoryData(CompletedJob.java:363)
              at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.<init>(CompletedJob.java:105)
              at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo.loadJob(HistoryFileManager.java:473)
              at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.loadJob(CachedHistoryStorage.java:180)
              at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.access$000(CachedHistoryStorage.java:52)
              at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage$1.load(CachedHistoryStorage.java:103)
              at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage$1.load(CachedHistoryStorage.java:100)
              at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
              at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
              at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
              at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
              at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
              at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
              at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
              at com.google.common.cache.LocalCache$LocalManualCache.getUnchecked(LocalCache.java:4834)
              at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:193)
              at org.apache.hadoop.mapreduce.v2.hs.JobHistory.getJob(JobHistory.java:220)
              at org.apache.hadoop.mapreduce.v2.app.webapp.AppController.requireJob(AppController.java:416)
              at org.apache.hadoop.mapreduce.v2.app.webapp.AppController.attempts(AppController.java:277)
              at org.apache.hadoop.mapreduce.v2.hs.webapp.HsController.attempts(HsController.java:152)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:498)
              at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:162)
              at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
              at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287)
              at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277)
              at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)
              at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
              at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
              at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941)
              at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875)
              at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829)
              at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
              at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)
              at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)
              at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)
              at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)
              at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)
              at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
              at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
              at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
              at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
              at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
              at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1552)
              at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
              at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
              at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
              at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
              at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
              at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
              at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
              at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
              at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
              at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
              at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
              at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
              at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
              at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
              at org.eclipse.jetty.server.Server.handle(Server.java:534)
              at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
              at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
              at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
              at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
              at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
              at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
              at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
              at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
              at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
              at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
              at java.lang.Thread.run(Thread.java:748)
      

      I was able to further confirm this theory by backing out the 4 recent commits to Configuration on alpha3 and no longer seeing CLOSE_WAIT sockets.

      It's not clear to me who's responsible to close the InputStream though.

        Attachments

        1. HADOOP-14727.002.patch
          3 kB
          Jonathan Eagles
        2. HADOOP-14727.001.patch
          3 kB
          Jonathan Eagles
        3. HADOOP-14727.001-branch-2.patch
          3 kB
          Jonathan Eagles

          Issue Links

            Activity

              People

              • Assignee:
                jeagles Jonathan Eagles
                Reporter:
                xiaochen Xiao Chen
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: