[HBASE-4222] Make HLog more resilient to write pipeline failures - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.90.5
Component/s: wal
Labels:
None

Description

The current implementation of HLog rolling to recover from transient errors in the write pipeline seems to have two problems:

When HLog.LogSyncer triggers an IOException during time-based sync operations, it triggers a log rolling request in the corresponding catch block, but only after escaping from the internal while loop. As a result, the LogSyncer thread will exit and never be restarted from what I can tell, even if the log rolling was successful.
Log rolling requests triggered by an IOException in sync() or append() never happen if no entries have yet been written to the log. This means that write errors are not immediately recovered, which extends the exposure to more errors occurring in the pipeline.

In addition, it seems like we should be able to better handle transient problems, like a rolling restart of DataNodes while the HBase RegionServers are running. Currently this will reliably cause RegionServer aborts during log rolling: either an append or time-based sync triggers an initial IOException, initiating a log rolling request. However the log rolling then fails in closing the current writer ("All datanodes are bad"), causing a RegionServer abort. In this case, it seems like we should at least allow you an option to continue with the new writer and only abort on subsequent errors.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-4222_0.90.patch
25/Aug/11 03:27
15 kB
Gary Helmling
HBASE-4222_trunk_final.patch
24/Aug/11 23:43
17 kB
Gary Helmling

Issue Links

relates to

HBASE-4274 RS should periodically ping its HLog pipeline even if no writes are arriving

Closed

Activity

People

Assignee:: Gary Helmling

Reporter:: Gary Helmling

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 18/Aug/11 01:23

Updated:: 20/Nov/15 11:54

Resolved:: 25/Aug/11 03:45