Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-2768

Agitator not restarting all datanodes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.5.1, 1.6.0
    • 1.5.2, 1.6.1
    • test
    • 1.6.0 RC5, hadoop 2.2.0, ZK 3.4.5
      20 node EC2 cluster

    Description

      I ran a 24 hours CI test against 1.6.0 RC5 w/ agitation.

      I modified the agitation settings to the following :

      #time amount of time (in minutes) the agitator should sleep before killing
      KILL_SLEEP_TIME=3
      
      #time amount of time (in minutes) the agitator should sleep after killing before running tup 
      TUP_SLEEP_TIME=1
      
      #the minimum and maximum server the agitator will kill at once
      MIN_KILL=1
      MAX_KILL=2
      
      

      I started 3 walkers all of which died. The walkers saw org.apache.accumulo.core.client.impl.AccumuloServerException. On the tserver the cause was org.apache.hadoop.hdfs.BlockMissingException.

      After stopping agitation scripts, I ran start-dfs.sh and saw it started 5 datanodes. Looking at datanode-agitator.pl I think the problem is when it kills two datanodes, it only restarts one.

      All of my ingest clients survived and were able to write 8 billion entries in this wacky environment. I noticed on the monitor that there were long periods of no ingest, but it was not a complete flat line.

      Attachments

        1. ACCUMULO-2768.patch
          1 kB
          Drew Farris

        Activity

          People

            drew.farris Drew Farris
            kturner Keith Turner
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: