[HADOOP-12829] StatisticsDataReferenceCleaner swallows interrupt exceptions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.8.0, 2.7.3, 2.6.4
Fix Version/s: 2.9.0, 3.0.0-alpha1
Component/s: fs
Labels:
None

Target Version/s:

2.9.0

Description

The StatisticsDataReferenceCleaner, implemented in ~~HADOOP-12107~~ swallows interrupt exceptions. Over in Solr/Sentry land, we run thread leak checkers on our test code, which passed before this change and fails after it. Here's a sample report:

1 thread leaked from SUITE scope at org.apache.solr.handler.TestSecureReplicationHandler: 
   1) Thread[id=16, name=org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner, state=WAITING, group=TGRP-TestSecureReplicationHandler]
        at java.lang.Object.wait(Native Method)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
        at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3040)
        at java.lang.Thread.run(Thread.java:745)

And here's an indication that the interrupt is being ignored:

25209 T16 oahf.FileSystem$Statistics$StatisticsDataReferenceCleaner.run WARN exception in the cleaner thread but it will continue to run java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
	at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3040)
	at java.lang.Thread.run(Thread.java:745)

This is inconsistent with how other long-running threads in hadoop, i.e. PeerCache respond to being interrupted.

The argument for doing this in ~~HADOOP-12107~~ is given as (https://issues.apache.org/jira/browse/HADOOP-12107?focusedCommentId=14598397&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14598397):

Cleaner#run
Catch and log InterruptedException in the while loop, such that thread does not die on a spurious wakeup. It's safe since it's a daemon thread.

I'm unclear on what "spurious wakeup" means and it is not mentioned in https://docs.oracle.com/javase/tutorial/essential/concurrency/interrupt.html:

A thread sends an interrupt by invoking interrupt on the Thread object for the thread to be interrupted. For the interrupt mechanism to work correctly, the interrupted thread must support its own interruption.

So, I believe this thread should respect interruption.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HADOOP-12829.patch
22/Feb/16 22:51
1 kB
Gregory Chanan
HADOOP-12829.patch
20/Feb/16 02:10
1 kB
Gregory Chanan

Issue Links

is related to

HADOOP-12107 long running apps may have a huge number of StatisticsData instances under FileSystem

Closed

Activity

People

Assignee:: Gregory Chanan

Reporter:: Gregory Chanan

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 20/Feb/16 01:19

Updated:: 16/Dec/19 10:04

Resolved:: 23/Feb/16 19:33