Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-14727

Socket not closed properly when reading Configurations with BlockReaderRemote

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.9.0, 3.0.0-alpha4
    • Fix Version/s: 2.9.0, 3.0.0-beta1
    • Component/s: conf
    • Labels:
      None

      Description

      This is caught by Cloudera's internal testing over the alpha4 release.

      We got reports that some hosts ran out of FDs. Triaging that, found out both oozie server and Yarn JobHistoryServer have tons of sockets on CLOSE_WAIT state.

      Haibo Chen helped narrow down to a consistent reproduction by simply visiting the JHS web UI, and clicking through a job and its logs.

      I then look at the BlockReaderRemote and related code, and didn't spot any leaks in the implementation. After adding a debug log whenever a Peer is created/closed/in/out PeerCache, it looks like all the CLOSE_WAIT sockets are created from this call stack:

      2017-08-02 13:58:59,901 INFO org.apache.hadoop.hdfs.client.impl.BlockReaderFactory: ____ associated peer NioInetPeer(Socket[addr=/10.17.196.28,port=20002,localport=42512]) with blockreader org.apache.hadoop.hdfs.client.impl.BlockReaderRemote@717ce109
      java.lang.Exception: test
              at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:745)
              at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:385)
              at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:636)
              at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:566)
              at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:749)
              at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:807)
              at java.io.DataInputStream.read(DataInputStream.java:149)
              at com.ctc.wstx.io.StreamBootstrapper.ensureLoaded(StreamBootstrapper.java:482)
              at com.ctc.wstx.io.StreamBootstrapper.resolveStreamEncoding(StreamBootstrapper.java:306)
              at com.ctc.wstx.io.StreamBootstrapper.bootstrapInput(StreamBootstrapper.java:167)
              at com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:573)
              at com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:633)
              at com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:647)
              at com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:366)
              at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2649)
              at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2697)
              at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2662)
              at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2545)
              at org.apache.hadoop.conf.Configuration.get(Configuration.java:1076)
              at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1126)
              at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1344)
              at org.apache.hadoop.mapreduce.counters.Limits.init(Limits.java:45)
              at org.apache.hadoop.mapreduce.counters.Limits.reset(Limits.java:130)
              at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.loadFullHistoryData(CompletedJob.java:363)
              at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.<init>(CompletedJob.java:105)
              at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo.loadJob(HistoryFileManager.java:473)
              at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.loadJob(CachedHistoryStorage.java:180)
              at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.access$000(CachedHistoryStorage.java:52)
              at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage$1.load(CachedHistoryStorage.java:103)
              at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage$1.load(CachedHistoryStorage.java:100)
              at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
              at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
              at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
              at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
              at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
              at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
              at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
              at com.google.common.cache.LocalCache$LocalManualCache.getUnchecked(LocalCache.java:4834)
              at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:193)
              at org.apache.hadoop.mapreduce.v2.hs.JobHistory.getJob(JobHistory.java:220)
              at org.apache.hadoop.mapreduce.v2.app.webapp.AppController.requireJob(AppController.java:416)
              at org.apache.hadoop.mapreduce.v2.app.webapp.AppController.attempts(AppController.java:277)
              at org.apache.hadoop.mapreduce.v2.hs.webapp.HsController.attempts(HsController.java:152)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:498)
              at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:162)
              at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
              at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287)
              at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277)
              at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)
              at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
              at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
              at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941)
              at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875)
              at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829)
              at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
              at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)
              at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)
              at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)
              at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)
              at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)
              at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
              at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
              at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
              at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
              at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
              at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1552)
              at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
              at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
              at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
              at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
              at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
              at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
              at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
              at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
              at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
              at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
              at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
              at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
              at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
              at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
              at org.eclipse.jetty.server.Server.handle(Server.java:534)
              at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
              at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
              at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
              at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
              at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
              at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
              at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
              at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
              at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
              at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
              at java.lang.Thread.run(Thread.java:748)
      

      I was able to further confirm this theory by backing out the 4 recent commits to Configuration on alpha3 and no longer seeing CLOSE_WAIT sockets.

      It's not clear to me who's responsible to close the InputStream though.

      1. HADOOP-14727.001-branch-2.patch
        3 kB
        Jonathan Eagles
      2. HADOOP-14727.001.patch
        3 kB
        Jonathan Eagles
      3. HADOOP-14727.002.patch
        3 kB
        Jonathan Eagles

        Issue Links

          Activity

          Hide
          xiaochen Xiao Chen added a comment -

          Hi Jonathan Eagles,
          Would you be able to look into this issue, as the author of HADOOP-14216 and HADOOP-14501?

          Show
          xiaochen Xiao Chen added a comment - Hi Jonathan Eagles , Would you be able to look into this issue, as the author of HADOOP-14216 and HADOOP-14501 ?
          Hide
          stevel@apache.org Steve Loughran added a comment -

          good catch. I'd assume Configuration will have to take up the closing; given it's meant to be idempotent, I don't see any harm in calling it twice anyway

          Show
          stevel@apache.org Steve Loughran added a comment - good catch. I'd assume Configuration will have to take up the closing; given it's meant to be idempotent, I don't see any harm in calling it twice anyway
          Hide
          jeagles Jonathan Eagles added a comment -

          Thanks for looking into this. Will try to post a patch today regarding this.

          Show
          jeagles Jonathan Eagles added a comment - Thanks for looking into this. Will try to post a patch today regarding this.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Steve and Jonathan! Assigning this jira to Jonathan, I can help with reviews.

          Show
          xiaochen Xiao Chen added a comment - Thanks Steve and Jonathan! Assigning this jira to Jonathan, I can help with reviews.
          Hide
          jeagles Jonathan Eagles added a comment -

          Verified CLOSE_WAIT sockets were being leaked on branch-2 jobhistory server with simple lsof | gerp CLOSE_WAIT and reloading specific mapreduce job configuration reload. With patch, no CLOSE_WAIT sockets are left. Fix was to flag InputStreams as auto-close if they are opened by Configuration and leave them as is if InputStream was passed as a resource to prevent closing an InputStream opened by the user.

          Show
          jeagles Jonathan Eagles added a comment - Verified CLOSE_WAIT sockets were being leaked on branch-2 jobhistory server with simple lsof | gerp CLOSE_WAIT and reloading specific mapreduce job configuration reload. With patch, no CLOSE_WAIT sockets are left. Fix was to flag InputStreams as auto-close if they are opened by Configuration and leave them as is if InputStream was passed as a resource to prevent closing an InputStream opened by the user.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Jonathan Eagles for quickly getting to this! The fix looks to be the correct direction to me.

          Could you add a unit test?

          And what confuses me is, for the 3.0.0 reproduction stack trace I pasted, this leak is actually coming from the resource instanceof InputStream code block. The stack's org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2697) points to this line. (I added 1 line for debug logging locally). This is my initial confusion about responsibility of closing the streams.
          Hand-verified that backporting patch 1 to the internal cluster doesn't make the CLOSE_WAIT go away. Thoughts?

          Show
          xiaochen Xiao Chen added a comment - Thanks Jonathan Eagles for quickly getting to this! The fix looks to be the correct direction to me. Could you add a unit test? And what confuses me is, for the 3.0.0 reproduction stack trace I pasted, this leak is actually coming from the resource instanceof InputStream code block. The stack's org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2697) points to this line . (I added 1 line for debug logging locally). This is my initial confusion about responsibility of closing the streams. Hand-verified that backporting patch 1 to the internal cluster doesn't make the CLOSE_WAIT go away. Thoughts?
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 19s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
                branch-2 Compile Tests
          +1 mvninstall 7m 30s branch-2 passed
          +1 compile 5m 58s branch-2 passed with JDK v1.8.0_131
          +1 compile 6m 44s branch-2 passed with JDK v1.7.0_131
          +1 checkstyle 0m 27s branch-2 passed
          +1 mvnsite 0m 59s branch-2 passed
          +1 findbugs 1m 38s branch-2 passed
          +1 javadoc 0m 42s branch-2 passed with JDK v1.8.0_131
          +1 javadoc 0m 50s branch-2 passed with JDK v1.7.0_131
                Patch Compile Tests
          +1 mvninstall 0m 40s the patch passed
          +1 compile 5m 35s the patch passed with JDK v1.8.0_131
          +1 javac 5m 35s the patch passed
          +1 compile 6m 38s the patch passed with JDK v1.7.0_131
          +1 javac 6m 38s the patch passed
          -0 checkstyle 0m 26s hadoop-common-project/hadoop-common: The patch generated 1 new + 147 unchanged - 1 fixed = 148 total (was 148)
          +1 mvnsite 0m 58s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 54s the patch passed
          +1 javadoc 0m 42s the patch passed with JDK v1.8.0_131
          +1 javadoc 0m 49s the patch passed with JDK v1.7.0_131
                Other Tests
          -1 unit 7m 54s hadoop-common in the patch failed with JDK v1.7.0_131.
          +1 asflicense 0m 22s The patch does not generate ASF License warnings.
          61m 1s



          Reason Tests
          JDK v1.8.0_131 Failed junit tests hadoop.net.TestDNS
          JDK v1.7.0_131 Failed junit tests hadoop.net.TestDNS



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:5e40efe
          JIRA Issue HADOOP-14727
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12880297/HADOOP-14727.001-branch-2.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 722ab85978b7 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision branch-2 / b6729a7
          Default Java 1.7.0_131
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_131 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_131
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12945/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt
          unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12945/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_131.txt
          JDK v1.7.0_131 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12945/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12945/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 19s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.       branch-2 Compile Tests +1 mvninstall 7m 30s branch-2 passed +1 compile 5m 58s branch-2 passed with JDK v1.8.0_131 +1 compile 6m 44s branch-2 passed with JDK v1.7.0_131 +1 checkstyle 0m 27s branch-2 passed +1 mvnsite 0m 59s branch-2 passed +1 findbugs 1m 38s branch-2 passed +1 javadoc 0m 42s branch-2 passed with JDK v1.8.0_131 +1 javadoc 0m 50s branch-2 passed with JDK v1.7.0_131       Patch Compile Tests +1 mvninstall 0m 40s the patch passed +1 compile 5m 35s the patch passed with JDK v1.8.0_131 +1 javac 5m 35s the patch passed +1 compile 6m 38s the patch passed with JDK v1.7.0_131 +1 javac 6m 38s the patch passed -0 checkstyle 0m 26s hadoop-common-project/hadoop-common: The patch generated 1 new + 147 unchanged - 1 fixed = 148 total (was 148) +1 mvnsite 0m 58s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 54s the patch passed +1 javadoc 0m 42s the patch passed with JDK v1.8.0_131 +1 javadoc 0m 49s the patch passed with JDK v1.7.0_131       Other Tests -1 unit 7m 54s hadoop-common in the patch failed with JDK v1.7.0_131. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 61m 1s Reason Tests JDK v1.8.0_131 Failed junit tests hadoop.net.TestDNS JDK v1.7.0_131 Failed junit tests hadoop.net.TestDNS Subsystem Report/Notes Docker Image:yetus/hadoop:5e40efe JIRA Issue HADOOP-14727 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12880297/HADOOP-14727.001-branch-2.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 722ab85978b7 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2 / b6729a7 Default Java 1.7.0_131 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_131 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_131 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12945/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12945/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_131.txt JDK v1.7.0_131 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12945/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12945/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          I also looked into the local repro when backing out the 4 mentioned jiras.
          This is how that same BlockReaderRemote created from the above is closed:

          2017-08-03 21:10:53,052 INFO org.apache.hadoop.hdfs.client.impl.BlockReaderRemote: ____ closing blockreaderremote org.apache.hadoop.hdfs.client.impl.BlockReaderRemote@80ceea3
          java.lang.Exception: ____
                  at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.close(BlockReaderRemote.java:310)
                  at org.apache.hadoop.hdfs.DFSInputStream.closeCurrentBlockReaders(DFSInputStream.java:1572)
                  at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:664)
                  at java.io.FilterInputStream.close(FilterInputStream.java:181)
                  at org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.close(Unknown Source)
                  at org.apache.xerces.impl.io.UTF8Reader.close(Unknown Source)
                  at org.apache.xerces.impl.XMLEntityManager.endEntity(Unknown Source)
                  at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
                  at org.apache.xerces.impl.XMLEntityScanner.skipSpaces(Unknown Source)
                  at org.apache.xerces.impl.XMLDocumentScannerImpl$TrailingMiscDispatcher.dispatch(Unknown Source)
                  at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
                  at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
                  at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
                  at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
                  at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
                  at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
                  at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
                  at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2645)
                  at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2713)
                  at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2662)
                  at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2540)
                  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1071)
                  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1121)
                  at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1339)
                  at org.apache.hadoop.mapreduce.counters.Limits.init(Limits.java:45)
                  at org.apache.hadoop.mapreduce.counters.Limits.reset(Limits.java:130)
                  at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.loadFullHistoryData(CompletedJob.java:363)
                  at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.<init>(CompletedJob.java:105)
                  at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo.loadJob(HistoryFileManager.java:473)
                  at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.loadJob(CachedHistoryStorage.java:180)
                  at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.access$000(CachedHistoryStorage.java:52)
                  at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage$1.load(CachedHistoryStorage.java:103)
                  at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage$1.load(CachedHistoryStorage.java:100)
                  at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
                  at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
                  at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
                  at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
                  at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
                  at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
                  at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
                  at com.google.common.cache.LocalCache$LocalManualCache.getUnchecked(LocalCache.java:4834)
                  at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:193)
          

          where

          2712      } else if (resource instanceof InputStream) {
          2713        doc = parse(builder, (InputStream) resource, null);
          2714        returnCachedProperties = true;
          2715      } else if (resource instanceof Properties) {
          

          Although I naturally feel the same with what patch 1 does, it seems the existing behavior is to close regardlessly.

          Show
          xiaochen Xiao Chen added a comment - I also looked into the local repro when backing out the 4 mentioned jiras. This is how that same BlockReaderRemote created from the above is closed: 2017-08-03 21:10:53,052 INFO org.apache.hadoop.hdfs.client.impl.BlockReaderRemote: ____ closing blockreaderremote org.apache.hadoop.hdfs.client.impl.BlockReaderRemote@80ceea3 java.lang.Exception: ____ at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.close(BlockReaderRemote.java:310) at org.apache.hadoop.hdfs.DFSInputStream.closeCurrentBlockReaders(DFSInputStream.java:1572) at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:664) at java.io.FilterInputStream.close(FilterInputStream.java:181) at org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.close(Unknown Source) at org.apache.xerces.impl.io.UTF8Reader.close(Unknown Source) at org.apache.xerces.impl.XMLEntityManager.endEntity(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.skipSpaces(Unknown Source) at org.apache.xerces.impl.XMLDocumentScannerImpl$TrailingMiscDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2645) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2713) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2662) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2540) at org.apache.hadoop.conf.Configuration.get(Configuration.java:1071) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1121) at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1339) at org.apache.hadoop.mapreduce.counters.Limits.init(Limits.java:45) at org.apache.hadoop.mapreduce.counters.Limits.reset(Limits.java:130) at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.loadFullHistoryData(CompletedJob.java:363) at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.<init>(CompletedJob.java:105) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo.loadJob(HistoryFileManager.java:473) at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.loadJob(CachedHistoryStorage.java:180) at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.access$000(CachedHistoryStorage.java:52) at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage$1.load(CachedHistoryStorage.java:103) at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage$1.load(CachedHistoryStorage.java:100) at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313) at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) at com.google.common.cache.LocalCache.get(LocalCache.java:3965) at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969) at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829) at com.google.common.cache.LocalCache$LocalManualCache.getUnchecked(LocalCache.java:4834) at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:193) where 2712 } else if (resource instanceof InputStream) { 2713 doc = parse(builder, (InputStream) resource, null ); 2714 returnCachedProperties = true ; 2715 } else if (resource instanceof Properties) { Although I naturally feel the same with what patch 1 does, it seems the existing behavior is to close regardlessly.
          Hide
          jeagles Jonathan Eagles added a comment -

          I verified the code is expecting Configuration to close the input streams. I have reverted to the old behavior and close all input streams whether opened by users or by Configuration object.

          Show
          jeagles Jonathan Eagles added a comment - I verified the code is expecting Configuration to close the input streams. I have reverted to the old behavior and close all input streams whether opened by users or by Configuration object.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks a lot Jonathan Eagles, +1 on v2 pending Jenkins.

          Show
          xiaochen Xiao Chen added a comment - Thanks a lot Jonathan Eagles , +1 on v2 pending Jenkins.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 2m 0s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
                trunk Compile Tests
          +1 mvninstall 15m 16s trunk passed
          +1 compile 15m 12s trunk passed
          +1 checkstyle 0m 41s trunk passed
          +1 mvnsite 1m 34s trunk passed
          +1 findbugs 1m 28s trunk passed
          +1 javadoc 0m 57s trunk passed
                Patch Compile Tests
          +1 mvninstall 0m 43s the patch passed
          +1 compile 14m 21s the patch passed
          +1 javac 14m 21s the patch passed
          -0 checkstyle 0m 46s hadoop-common-project/hadoop-common: The patch generated 2 new + 264 unchanged - 1 fixed = 266 total (was 265)
          +1 mvnsite 1m 41s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 37s the patch passed
          +1 javadoc 0m 59s the patch passed
                Other Tests
          +1 unit 8m 47s hadoop-common in the patch passed.
          +1 asflicense 0m 29s The patch does not generate ASF License warnings.
          68m 35s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HADOOP-14727
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12880450/HADOOP-14727.002.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux e73d6da88ff4 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / f44b349
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12957/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12957/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12957/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 2m 0s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.       trunk Compile Tests +1 mvninstall 15m 16s trunk passed +1 compile 15m 12s trunk passed +1 checkstyle 0m 41s trunk passed +1 mvnsite 1m 34s trunk passed +1 findbugs 1m 28s trunk passed +1 javadoc 0m 57s trunk passed       Patch Compile Tests +1 mvninstall 0m 43s the patch passed +1 compile 14m 21s the patch passed +1 javac 14m 21s the patch passed -0 checkstyle 0m 46s hadoop-common-project/hadoop-common: The patch generated 2 new + 264 unchanged - 1 fixed = 266 total (was 265) +1 mvnsite 1m 41s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 37s the patch passed +1 javadoc 0m 59s the patch passed       Other Tests +1 unit 8m 47s hadoop-common in the patch passed. +1 asflicense 0m 29s The patch does not generate ASF License warnings. 68m 35s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14727 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12880450/HADOOP-14727.002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e73d6da88ff4 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / f44b349 Default Java 1.8.0_131 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12957/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12957/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12957/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment - - edited

          +1. Checkstyles are related, but they're only 80-chars and could be fixed during commit IMO.
          branch-2 backport's conflicts are also trivial, import-only.

          I'll commit to trunk and branch-2 on Monday morning in case Steve and others want to comment.

          Show
          xiaochen Xiao Chen added a comment - - edited +1. Checkstyles are related, but they're only 80-chars and could be fixed during commit IMO. branch-2 backport's conflicts are also trivial, import-only. I'll commit to trunk and branch-2 on Monday morning in case Steve and others want to comment.
          Hide
          xiaochen Xiao Chen added a comment -

          +1, committing this

          Show
          xiaochen Xiao Chen added a comment - +1, committing this
          Hide
          xiaochen Xiao Chen added a comment -

          Committed this to trunk and branch-2.
          Compiled and ran TestConfiguration on branch-2 before pushing.

          Thanks Haibo Chen for the initial report, Jonathan Eagles for the fix and Steve Loughran for commenting!

          Show
          xiaochen Xiao Chen added a comment - Committed this to trunk and branch-2. Compiled and ran TestConfiguration on branch-2 before pushing. Thanks Haibo Chen for the initial report, Jonathan Eagles for the fix and Steve Loughran for commenting!
          Hide
          stevel@apache.org Steve Loughran added a comment -

          thank you for finding it and tracking down the problem!

          Show
          stevel@apache.org Steve Loughran added a comment - thank you for finding it and tracking down the problem!

            People

            • Assignee:
              jeagles Jonathan Eagles
              Reporter:
              xiaochen Xiao Chen
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development