Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-378

Large-scale reliability tests

    XMLWordPrintableJSON

Details

    • Test
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      The fact that we do not have any large-scale reliability tests bothers me. I'll be first to admit that it isn't the easiest of tasks, but I'd like to start a discussion around this... especially given that the code-base is growing to an extent that interactions due to small changes are very hard to predict.

      One of the simple scripts I run for every patch I work on does something very simple: run sort500 (or greater), then it randomly picks n tasktrackers from ${HADOOP_CONF_DIR}/conf/slaves and then kills them, a similar script one kills and restarts the tasktrackers.

      This helps in checking a fair number of reliability stories: lost tasktrackers, task-failures etc. Clearly this isn't good enough to cover everything, but a start.

      Lets discuss - What do we do for HDFS? We need more for Map-Reduce!

      Attachments

        Activity

          People

            ddas Devaraj Das
            acmurthy Arun Murthy
            Votes:
            1 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated: