[HBASE-5973] Add ability for potentially long-running IPC calls to abort if client disconnects - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.90.7, 0.92.1, 0.94.0, 0.95.2
Fix Version/s: 0.94.1, 0.95.0
Component/s: IPC/RPC
Labels:
None

Hadoop Flags:

Reviewed

Description

We recently had a cluster issue where a user was submitting scanners with a very restrictive filter, and then calling next() with a high scanner caching value. The clients would generally time out the next() call and disconnect, but the IPC kept running looking to fill the requested number of rows. Since this was in the context of MR, the tasks making the calls would retry, and the retries wuld be more likely to time out due to contention with the previous still-running scanner next() call. Eventually, the system spiraled out of control.

We should add a hook to the IPC system so that RPC calls can check if the client has already disconnected. In such a case, the next() call could abort processing, given any further work is wasted. I imagine coprocessor endpoints, etc, could make good use of this as well.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hbase-5973.txt
09/May/12 22:02
12 kB
Todd Lipcon
hbase-5973.txt
10/May/12 06:21
12 kB
Todd Lipcon
hbase-5973.txt
10/May/12 16:45
12 kB
Todd Lipcon
hbase-5973-0.94.txt
10/May/12 16:48
8 kB
Todd Lipcon
hbase-5973-0.94.txt
10/May/12 17:04
11 kB
Todd Lipcon
hbase-5973-0.92.txt
10/May/12 17:08
12 kB
Todd Lipcon
HBASE-5973-0.90.txt
11/May/12 15:06
10 kB
David S. Wang

Issue Links

relates to

HBASE-5757 TableInputFormat should handle as many errors as possible

Closed

Activity

People

Assignee:: Todd Lipcon

Reporter:: Todd Lipcon

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 09/May/12 18:58

Updated:: 26/Feb/13 08:16

Resolved:: 11/May/12 19:28