Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-12829

StatisticsDataReferenceCleaner swallows interrupt exceptions

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.8.0, 2.7.3, 2.6.4
    • Fix Version/s: 2.9.0, 3.0.0-alpha1
    • Component/s: fs
    • Labels:
      None
    • Target Version/s:

      Description

      The StatisticsDataReferenceCleaner, implemented in HADOOP-12107 swallows interrupt exceptions. Over in Solr/Sentry land, we run thread leak checkers on our test code, which passed before this change and fails after it. Here's a sample report:

      1 thread leaked from SUITE scope at org.apache.solr.handler.TestSecureReplicationHandler: 
         1) Thread[id=16, name=org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner, state=WAITING, group=TGRP-TestSecureReplicationHandler]
              at java.lang.Object.wait(Native Method)
              at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
              at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
              at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3040)
              at java.lang.Thread.run(Thread.java:745)
      

      And here's an indication that the interrupt is being ignored:

      25209 T16 oahf.FileSystem$Statistics$StatisticsDataReferenceCleaner.run WARN exception in the cleaner thread but it will continue to run java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
      	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
      	at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3040)
      	at java.lang.Thread.run(Thread.java:745)
      

      This is inconsistent with how other long-running threads in hadoop, i.e. PeerCache respond to being interrupted.

      The argument for doing this in HADOOP-12107 is given as (https://issues.apache.org/jira/browse/HADOOP-12107?focusedCommentId=14598397&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14598397):

      Cleaner#run
      Catch and log InterruptedException in the while loop, such that thread does not die on a spurious wakeup. It's safe since it's a daemon thread.

      I'm unclear on what "spurious wakeup" means and it is not mentioned in https://docs.oracle.com/javase/tutorial/essential/concurrency/interrupt.html:

      A thread sends an interrupt by invoking interrupt on the Thread object for the thread to be interrupted. For the interrupt mechanism to work correctly, the interrupted thread must support its own interruption.

      So, I believe this thread should respect interruption.

      1. HADOOP-12829.patch
        1 kB
        Gregory Chanan
      2. HADOOP-12829.patch
        1 kB
        Gregory Chanan

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9351 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9351/)
          HADOOP-12829. StatisticsDataReferenceCleaner swallows interrupt (cmccabe: rev d9c409a4286e36387fb39e7d622e850c13315465)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9351 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9351/ ) HADOOP-12829 . StatisticsDataReferenceCleaner swallows interrupt (cmccabe: rev d9c409a4286e36387fb39e7d622e850c13315465) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          cmccabe Colin P. McCabe added a comment -

          Committed to 2.9, thanks!

          Show
          cmccabe Colin P. McCabe added a comment - Committed to 2.9, thanks!
          Hide
          gchanan Gregory Chanan added a comment -

          One small thing: in the future, please use different names for different patches (many people give them numbers).

          Sure thing. Every project is different (solr prefers patches to have the same name, except if it is substantially different, e.g. for a different branch).

          Show
          gchanan Gregory Chanan added a comment - One small thing: in the future, please use different names for different patches (many people give them numbers). Sure thing. Every project is different (solr prefers patches to have the same name, except if it is substantially different, e.g. for a different branch).
          Hide
          cmccabe Colin P. McCabe added a comment -

          +1. Thanks, Gregory Chanan. Good improvement.

          One small thing: in the future, please use different names for different patches (many people give them numbers).

          Show
          cmccabe Colin P. McCabe added a comment - +1. Thanks, Gregory Chanan . Good improvement. One small thing: in the future, please use different names for different patches (many people give them numbers).
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 11s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 7m 18s trunk passed
          +1 compile 6m 13s trunk passed with JDK v1.8.0_72
          +1 compile 6m 59s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 23s trunk passed
          +1 mvnsite 1m 5s trunk passed
          +1 mvneclipse 0m 16s trunk passed
          +1 findbugs 1m 38s trunk passed
          +1 javadoc 0m 54s trunk passed with JDK v1.8.0_72
          +1 javadoc 1m 8s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 43s the patch passed
          +1 compile 6m 48s the patch passed with JDK v1.8.0_72
          +1 javac 6m 48s the patch passed
          +1 compile 6m 50s the patch passed with JDK v1.7.0_95
          +1 javac 6m 50s the patch passed
          +1 checkstyle 0m 22s the patch passed
          +1 mvnsite 1m 3s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 1m 57s the patch passed
          +1 javadoc 0m 57s the patch passed with JDK v1.8.0_72
          +1 javadoc 1m 5s the patch passed with JDK v1.7.0_95
          +1 unit 6m 47s hadoop-common in the patch passed with JDK v1.8.0_72.
          -1 unit 7m 2s hadoop-common in the patch failed with JDK v1.7.0_95.
          +1 asflicense 0m 24s Patch does not generate ASF License warnings.
          61m 22s



          Reason Tests
          JDK v1.7.0_95 Failed junit tests hadoop.net.TestClusterTopology



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12789086/HADOOP-12829.patch
          JIRA Issue HADOOP-12829
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 9f779c158052 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 66289a3
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8691/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_95.txt
          unit test logs https://builds.apache.org/job/PreCommit-HADOOP-Build/8691/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_95.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/8691/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8691/console
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 11s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 7m 18s trunk passed +1 compile 6m 13s trunk passed with JDK v1.8.0_72 +1 compile 6m 59s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 23s trunk passed +1 mvnsite 1m 5s trunk passed +1 mvneclipse 0m 16s trunk passed +1 findbugs 1m 38s trunk passed +1 javadoc 0m 54s trunk passed with JDK v1.8.0_72 +1 javadoc 1m 8s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 43s the patch passed +1 compile 6m 48s the patch passed with JDK v1.8.0_72 +1 javac 6m 48s the patch passed +1 compile 6m 50s the patch passed with JDK v1.7.0_95 +1 javac 6m 50s the patch passed +1 checkstyle 0m 22s the patch passed +1 mvnsite 1m 3s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 57s the patch passed +1 javadoc 0m 57s the patch passed with JDK v1.8.0_72 +1 javadoc 1m 5s the patch passed with JDK v1.7.0_95 +1 unit 6m 47s hadoop-common in the patch passed with JDK v1.8.0_72. -1 unit 7m 2s hadoop-common in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 24s Patch does not generate ASF License warnings. 61m 22s Reason Tests JDK v1.7.0_95 Failed junit tests hadoop.net.TestClusterTopology Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12789086/HADOOP-12829.patch JIRA Issue HADOOP-12829 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 9f779c158052 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 66289a3 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8691/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-HADOOP-Build/8691/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/8691/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8691/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          gchanan Gregory Chanan added a comment -

          Updated patch for Colin's comments.

          Show
          gchanan Gregory Chanan added a comment - Updated patch for Colin's comments.
          Hide
          gchanan Gregory Chanan added a comment -

          Thanks for taking a look Colin P. McCabe, will update the patch.

          Also agree with sjlee, lowering the priority to minor.

          Show
          gchanan Gregory Chanan added a comment - Thanks for taking a look Colin P. McCabe , will update the patch. Also agree with sjlee, lowering the priority to minor.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 10s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 6m 31s trunk passed
          +1 compile 5m 52s trunk passed with JDK v1.8.0_72
          +1 compile 6m 37s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 22s trunk passed
          +1 mvnsite 1m 1s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 1m 31s trunk passed
          +1 javadoc 0m 54s trunk passed with JDK v1.8.0_72
          +1 javadoc 1m 2s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 39s the patch passed
          +1 compile 5m 44s the patch passed with JDK v1.8.0_72
          +1 javac 5m 44s the patch passed
          +1 compile 6m 39s the patch passed with JDK v1.7.0_95
          +1 javac 6m 39s the patch passed
          +1 checkstyle 0m 21s the patch passed
          +1 mvnsite 1m 2s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 1m 48s the patch passed
          +1 javadoc 0m 51s the patch passed with JDK v1.8.0_72
          +1 javadoc 1m 5s the patch passed with JDK v1.7.0_95
          +1 unit 6m 43s hadoop-common in the patch passed with JDK v1.8.0_72.
          +1 unit 7m 8s hadoop-common in the patch passed with JDK v1.7.0_95.
          +1 asflicense 0m 22s Patch does not generate ASF License warnings.
          57m 56s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12788805/HADOOP-12829.patch
          JIRA Issue HADOOP-12829
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 1b3a38ba5b74 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 3fab885
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/8686/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8686/console
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 10s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 6m 31s trunk passed +1 compile 5m 52s trunk passed with JDK v1.8.0_72 +1 compile 6m 37s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 22s trunk passed +1 mvnsite 1m 1s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 31s trunk passed +1 javadoc 0m 54s trunk passed with JDK v1.8.0_72 +1 javadoc 1m 2s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 39s the patch passed +1 compile 5m 44s the patch passed with JDK v1.8.0_72 +1 javac 5m 44s the patch passed +1 compile 6m 39s the patch passed with JDK v1.7.0_95 +1 javac 6m 39s the patch passed +1 checkstyle 0m 21s the patch passed +1 mvnsite 1m 2s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 48s the patch passed +1 javadoc 0m 51s the patch passed with JDK v1.8.0_72 +1 javadoc 1m 5s the patch passed with JDK v1.7.0_95 +1 unit 6m 43s hadoop-common in the patch passed with JDK v1.8.0_72. +1 unit 7m 8s hadoop-common in the patch passed with JDK v1.7.0_95. +1 asflicense 0m 22s Patch does not generate ASF License warnings. 57m 56s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12788805/HADOOP-12829.patch JIRA Issue HADOOP-12829 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 1b3a38ba5b74 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 3fab885 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/8686/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8686/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          sjlee0 Sangjin Lee added a comment -

          I am +1 with the change proposed here. That said, I'd like to add a little more context to this.

          I agree that as a rule one should always restore the interrupt upon catching the InterruptedException in order to make the thread interruptible. However, in this particular case the issue becomes bit academic. This thread is private to the FileSystem class, meaning that one cannot easily obtain a reference to this thread and interrupt it explicitly. This thread is also a daemon thread, and as such it will not hold up the process when the process is terminating. Those two facts combined make it acceptable to have an uninterruptible daemon thread. In a non-test scenario, interrupting this thread should not happen. So in that sense, I don't think this is a major issue one way or another (in that sense I would recommend lowering the priority of the issue to minor).

          The most important goal here is to ensure that this thread does NOT terminate under any other condition (exceptions or errors) as that would mean a catastrophic consequence for memory, which we're still doing.

          Show
          sjlee0 Sangjin Lee added a comment - I am +1 with the change proposed here. That said, I'd like to add a little more context to this. I agree that as a rule one should always restore the interrupt upon catching the InterruptedException in order to make the thread interruptible. However, in this particular case the issue becomes bit academic. This thread is private to the FileSystem class, meaning that one cannot easily obtain a reference to this thread and interrupt it explicitly. This thread is also a daemon thread, and as such it will not hold up the process when the process is terminating. Those two facts combined make it acceptable to have an uninterruptible daemon thread. In a non-test scenario, interrupting this thread should not happen. So in that sense, I don't think this is a major issue one way or another (in that sense I would recommend lowering the priority of the issue to minor). The most important goal here is to ensure that this thread does NOT terminate under any other condition (exceptions or errors) as that would mean a catastrophic consequence for memory, which we're still doing.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Thank you, Gregory Chanan. I can't think of any reason why this thread should swallow InterruptedException without a trace. It is not performing any operations that should inherently generate INE, as far as I can see, and if someone else sends an INE we ought to... interrupt the thread.

          +          } catch (InterruptedException ie) {
          +            LOG.warn("cleaner thread interrupted, will stop");
          +            Thread.currentThread().interrupt();
          

          Can you include the stack trace so that this exception has a better chance of getting noticed? Also, capitalize the error? +1 pending those changes

          Show
          cmccabe Colin P. McCabe added a comment - Thank you, Gregory Chanan . I can't think of any reason why this thread should swallow InterruptedException without a trace. It is not performing any operations that should inherently generate INE, as far as I can see, and if someone else sends an INE we ought to... interrupt the thread. + } catch (InterruptedException ie) { + LOG.warn( "cleaner thread interrupted, will stop" ); + Thread .currentThread().interrupt(); Can you include the stack trace so that this exception has a better chance of getting noticed? Also, capitalize the error? +1 pending those changes
          Hide
          gchanan Gregory Chanan added a comment -

          Attached a patch. Doesn't include any tests – not sure exactly what to test. I internally tested by changing STATS_DATA_CLEANER to package-private and writing the following test:

            @Test
            public void testShutdown() throws Exception {
              FileSystem.Statistics.STATS_DATA_CLEANER.interrupt();
              FileSystem.Statistics.STATS_DATA_CLEANER.join();
            }
          

          which passes with the change and hangs without it. I'm unclear on if hadoop even wants something like this, since I'm not up to speed on how hadoop handles JVM reuse for unit tests.

          Show
          gchanan Gregory Chanan added a comment - Attached a patch. Doesn't include any tests – not sure exactly what to test. I internally tested by changing STATS_DATA_CLEANER to package-private and writing the following test: @Test public void testShutdown() throws Exception { FileSystem.Statistics.STATS_DATA_CLEANER.interrupt(); FileSystem.Statistics.STATS_DATA_CLEANER.join(); } which passes with the change and hangs without it. I'm unclear on if hadoop even wants something like this, since I'm not up to speed on how hadoop handles JVM reuse for unit tests.

            People

            • Assignee:
              gchanan Gregory Chanan
              Reporter:
              gchanan Gregory Chanan
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development