[ACCUMULO-509] default walog copy/sort uses replication of 1 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.4.1, 1.5.0
Component/s: logger
Labels:
None
Environment:

medium size cluster

Description

During recovery, the logger copied/sorted a recovery walog to hdfs. The copy was ok, but there was a checksum error when replaying the data. The system did not recover without manual intervention. The work-around was to find the datanode serving the back block, and stop it. Then I removed the bad recovery file and restarted the master. The copy/sort took place again, and used a different datanode. Recovery proceeded successfully.

We need to use a higher replication and/or a more sophisticated approach to verifying and restarting recoveries.

Attachments

Activity

People

Assignee:: Keith Turner

Reporter:: Eric C. Newton

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 03/Apr/12 12:32

Updated:: 29/May/12 18:34

Resolved:: 29/May/12 18:34