[MAPREDUCE-378] Large-scale reliability tests - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Test
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

The fact that we do not have any large-scale reliability tests bothers me. I'll be first to admit that it isn't the easiest of tasks, but I'd like to start a discussion around this... especially given that the code-base is growing to an extent that interactions due to small changes are very hard to predict.

One of the simple scripts I run for every patch I work on does something very simple: run sort500 (or greater), then it randomly picks n tasktrackers from ${HADOOP_CONF_DIR}/conf/slaves and then kills them, a similar script one kills and restarts the tasktrackers.

This helps in checking a fair number of reliability stories: lost tasktrackers, task-failures etc. Clearly this isn't good enough to cover everything, but a start.

Lets discuss - What do we do for HDFS? We need more for Map-Reduce!

Attachments

Sub-Tasks

There are no Sub-Tasks for this issue.

Activity

People

Assignee:: Devaraj Das

Reporter:: Arun Murthy

Votes:: 1 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 21/Dec/07 21:01

Updated:: 20/Jun/09 07:51