Details

    • Type: Test Test
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Tests
    • Labels:
      None

      Description

      In this patch, I am adding 2 tests and a test util class.

      1) TestHAFailover: Fails over between the NameNodes a configurable number of times. Checks the service state of each NN after each failover to ensure that the service state is correct.

      2) TestKilLNNFailover: Kills the Active NN. Waits and fails over to the standby NN. Waits and checks that the former standby is now active. Restarts the killed NN and checks that it is now standby. Goes through this process a configurable number of times.

      3) HATestUtil: Helper methods to fail over and kill the NameNodes.

      1. bigtop-614.patch
        12 kB
        Sujay Rau
      2. bigtop-614.patch
        12 kB
        Sujay Rau
      3. HATests.patch
        11 kB
        Stephen Chu

        Issue Links

          Activity

          Hide
          Stephen Chu added a comment -

          Submitted a patch to trunk.

          Show
          Stephen Chu added a comment - Submitted a patch to trunk.
          Hide
          Sujay Rau added a comment - - edited

          Namenodes in HA might always be specified from hdfs-site.xml, so changing the following to your tests in the setUp() might get rid of the need to specify -Dservice.id.1=nn1 and -Dservice.id.2=nn2 from the command line.

          import org.apache.hadoop.conf.Configuration;

          static void setUp()

          { Configuration conf = new Configuration(); conf.addResource('hdfs-site.xml'); String namenodes = conf.get("dfs.ha.namenodes.ha-nn-uri"); service_id_1 = namenodes.split(",")[0]; service_id_2 = namenodes.split(",")[1]; assertTrue("Unspecified service id 1", service_id_1 != null); assertTrue("Unspecified service id 2", service_id_2 != null); num_failover = Integer.parseInt(System.getProperty("num.failover", "10")); }

          ^ sorry about formatting

          Show
          Sujay Rau added a comment - - edited Namenodes in HA might always be specified from hdfs-site.xml, so changing the following to your tests in the setUp() might get rid of the need to specify -Dservice.id.1=nn1 and -Dservice.id.2=nn2 from the command line. import org.apache.hadoop.conf.Configuration; static void setUp() { Configuration conf = new Configuration(); conf.addResource('hdfs-site.xml'); String namenodes = conf.get("dfs.ha.namenodes.ha-nn-uri"); service_id_1 = namenodes.split(",")[0]; service_id_2 = namenodes.split(",")[1]; assertTrue("Unspecified service id 1", service_id_1 != null); assertTrue("Unspecified service id 2", service_id_2 != null); num_failover = Integer.parseInt(System.getProperty("num.failover", "10")); } ^ sorry about formatting
          Hide
          Sujay Rau added a comment -

          Currently modifying these to get rid of necessary command line options and making it so the test can be run from any namenode (not just the active one... right now the test doesn't work if it is run from the standby namenode)

          Show
          Sujay Rau added a comment - Currently modifying these to get rid of necessary command line options and making it so the test can be run from any namenode (not just the active one... right now the test doesn't work if it is run from the standby namenode)
          Hide
          Roman Shaposhnik added a comment -

          Sujay, I agree with your comments above. Also, please include the following clean up into your new patch:

          1. there doesn't seem to be any need for a shell class in TestHAFailover and TestKillNNFailover
          2. the test shouldn't assume that it is executed under any specific user account (e.g. assumptions that it is root). If something requires root privileges it needs to be executed via a root shell. Something like shRoot = new Shell("/bin/bash -s", "root")
          3. please get rid of System.out.println calls around shell calls. all shell tracing is done via log4j
          4. it would be nice to get rid of the assertNotNull("JAVA_HOME has to be set to run this test", JAVA_HOME) and assertNotNull("HADOOP_HOME has to be set to run this test", HADOOP_HOME) in the HATestUtil
          Show
          Roman Shaposhnik added a comment - Sujay, I agree with your comments above. Also, please include the following clean up into your new patch: there doesn't seem to be any need for a shell class in TestHAFailover and TestKillNNFailover the test shouldn't assume that it is executed under any specific user account (e.g. assumptions that it is root). If something requires root privileges it needs to be executed via a root shell. Something like shRoot = new Shell("/bin/bash -s", "root") please get rid of System.out.println calls around shell calls. all shell tracing is done via log4j it would be nice to get rid of the assertNotNull("JAVA_HOME has to be set to run this test", JAVA_HOME) and assertNotNull("HADOOP_HOME has to be set to run this test", HADOOP_HOME) in the HATestUtil
          Hide
          Sujay Rau added a comment -

          Updated patch for three desribed files.

          Show
          Sujay Rau added a comment - Updated patch for three desribed files.
          Hide
          Sujay Rau added a comment -

          Roman, on my dfsadmin test you said that the test can't assume it is being run from the namenode. This is hard to do for HA tests since failovers need to be done from the namenode (I think...?). Do you have any recommendations for dealing with this, as I am not sure how to make HA tests more general to any node in the cluster.

          Also, I'll fix the Shell user thing for the tests I have submitted so far.. makes much more sense now. Thanks

          Show
          Sujay Rau added a comment - Roman, on my dfsadmin test you said that the test can't assume it is being run from the namenode. This is hard to do for HA tests since failovers need to be done from the namenode (I think...?). Do you have any recommendations for dealing with this, as I am not sure how to make HA tests more general to any node in the cluster. Also, I'll fix the Shell user thing for the tests I have submitted so far.. makes much more sense now. Thanks
          Hide
          Roman Shaposhnik added a comment -

          Sujay, this is a pretty perplexing problem that we've been trying not to notice so far – we have to have an infrastructure for doing what you want be we don't have any so far. Let me file a separate JIRA over the weekend that will aim at (finally!) addressing this issue.

          The trouble is, though, that I won't be able to work on it for quite some time. Do you feel like you'd be interested in tackling this?

          Meanwhile, we can still check the test code in, but it'll have to be disable during actual test runs I suppose. Hm. Let me think about this over the weekend.

          Show
          Roman Shaposhnik added a comment - Sujay, this is a pretty perplexing problem that we've been trying not to notice so far – we have to have an infrastructure for doing what you want be we don't have any so far. Let me file a separate JIRA over the weekend that will aim at (finally!) addressing this issue. The trouble is, though, that I won't be able to work on it for quite some time. Do you feel like you'd be interested in tackling this? Meanwhile, we can still check the test code in, but it'll have to be disable during actual test runs I suppose. Hm. Let me think about this over the weekend.
          Hide
          Stephen Chu added a comment - - edited

          Yes, definitely worth working on building this infrastructure now. It will be very helpful to expand other tests, too (e.g. fault injection HBase tests).

          I am interested in being involved as well.

          Also, is there a way for me to re-assign this issue to Sujay? I don't see him as a user I can assign issues to.

          Show
          Stephen Chu added a comment - - edited Yes, definitely worth working on building this infrastructure now. It will be very helpful to expand other tests, too (e.g. fault injection HBase tests). I am interested in being involved as well. Also, is there a way for me to re-assign this issue to Sujay? I don't see him as a user I can assign issues to.
          Hide
          Sujay Rau added a comment -

          I'd definitely be interested in working on this problem.. but need help figuring out a starting point.

          Show
          Sujay Rau added a comment - I'd definitely be interested in working on this problem.. but need help figuring out a starting point.
          Hide
          Sujay Rau added a comment -

          Updated the util file from previous patch I submitted

          Show
          Sujay Rau added a comment - Updated the util file from previous patch I submitted
          Hide
          Sujay Rau added a comment -

          last patch had an error

          Show
          Sujay Rau added a comment - last patch had an error
          Hide
          Roman Shaposhnik added a comment -

          I am canceling this patch for now, since we all seem to agree that it would make much more sense to abstract the tests away from the requirements of executing directly on the NN via BIGTOP-635

          Show
          Roman Shaposhnik added a comment - I am canceling this patch for now, since we all seem to agree that it would make much more sense to abstract the tests away from the requirements of executing directly on the NN via BIGTOP-635

            People

            • Assignee:
              Stephen Chu
              Reporter:
              Stephen Chu
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Development