Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-12829

StatisticsDataReferenceCleaner swallows interrupt exceptions

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.8.0, 2.7.3, 2.6.4
    • 2.9.0, 3.0.0-alpha1
    • fs
    • None

    Description

      The StatisticsDataReferenceCleaner, implemented in HADOOP-12107 swallows interrupt exceptions. Over in Solr/Sentry land, we run thread leak checkers on our test code, which passed before this change and fails after it. Here's a sample report:

      1 thread leaked from SUITE scope at org.apache.solr.handler.TestSecureReplicationHandler: 
         1) Thread[id=16, name=org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner, state=WAITING, group=TGRP-TestSecureReplicationHandler]
              at java.lang.Object.wait(Native Method)
              at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
              at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
              at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3040)
              at java.lang.Thread.run(Thread.java:745)
      

      And here's an indication that the interrupt is being ignored:

      25209 T16 oahf.FileSystem$Statistics$StatisticsDataReferenceCleaner.run WARN exception in the cleaner thread but it will continue to run java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
      	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
      	at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3040)
      	at java.lang.Thread.run(Thread.java:745)
      

      This is inconsistent with how other long-running threads in hadoop, i.e. PeerCache respond to being interrupted.

      The argument for doing this in HADOOP-12107 is given as (https://issues.apache.org/jira/browse/HADOOP-12107?focusedCommentId=14598397&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14598397):

      Cleaner#run
      Catch and log InterruptedException in the while loop, such that thread does not die on a spurious wakeup. It's safe since it's a daemon thread.

      I'm unclear on what "spurious wakeup" means and it is not mentioned in https://docs.oracle.com/javase/tutorial/essential/concurrency/interrupt.html:

      A thread sends an interrupt by invoking interrupt on the Thread object for the thread to be interrupted. For the interrupt mechanism to work correctly, the interrupted thread must support its own interruption.

      So, I believe this thread should respect interruption.

      Attachments

        1. HADOOP-12829.patch
          1 kB
          Gregory Chanan
        2. HADOOP-12829.patch
          1 kB
          Gregory Chanan

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            gchanan Gregory Chanan
            gchanan Gregory Chanan
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment