Details

    • Type: Test Test
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The fact that we do not have any large-scale reliability tests bothers me. I'll be first to admit that it isn't the easiest of tasks, but I'd like to start a discussion around this... especially given that the code-base is growing to an extent that interactions due to small changes are very hard to predict.

      One of the simple scripts I run for every patch I work on does something very simple: run sort500 (or greater), then it randomly picks n tasktrackers from $

      {HADOOP_CONF_DIR}

      /conf/slaves and then kills them, a similar script one kills and restarts the tasktrackers.

      This helps in checking a fair number of reliability stories: lost tasktrackers, task-failures etc. Clearly this isn't good enough to cover everything, but a start.

      Lets discuss - What do we do for HDFS? We need more for Map-Reduce!

        Activity

          People

          • Assignee:
            Devaraj Das
            Reporter:
            Arun C Murthy
          • Votes:
            1 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:

              Development