Whirr
  1. Whirr
  2. WHIRR-342

hadoop/hbase configuration & active roles on a node

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Labels:
      None

      Description

      The following limitations exist with the generation of hadoop-(core|hdfs|mapred).xml and hbase-site.xml (assuming WHIRR-339 applied):

      • they are not generated by all roles (e.g. tasktracker, thrift server, ...), by consequence running these roles by themselves on a node unaccompagnied of a role that does generate them will not work.
      • running two roles on the same node that generate the same files does not work as it should, as the generated contents gets appended twice to the same file, causing non-well-formed XML. This is because of the usage of jcloud's Statements.appendFile.

      The cheapest solution would be to replace Statements.appendFile with something similar but without the 'append' behavior, thus rather a 'Statements.overwriteFile' (not available in jclouds afaics).

      This of course assumes that when different roles are writing the same files, that they put the same contents in them, so that the overwriting does not matter.

      Alternatively, things could be made smarter so that the same configuration is only generated once for all roles.

      1. WHIRR-342.patch
        8 kB
        Karel Vervaeke
      2. WHIRR-342.patch
        8 kB
        Karel Vervaeke
      3. WHIRR-342.patch
        32 kB
        Karel Vervaeke
      4. WHIRR-342.patch
        42 kB
        Karel Vervaeke
      5. WHIRR-342.patch
        45 kB
        Karel Vervaeke

        Issue Links

          Activity

          Hide
          Andrei Savu added a comment -

          Committed. Thanks Karel! Thanks Tom for reviewing.

          Show
          Andrei Savu added a comment - Committed. Thanks Karel! Thanks Tom for reviewing.
          Hide
          Andrei Savu added a comment -

          I'm going to commit this now and I will make the needed changes in WHIRR-400 so that we can keep things moving.

          Show
          Andrei Savu added a comment - I'm going to commit this now and I will make the needed changes in WHIRR-400 so that we can keep things moving.
          Hide
          Andrei Savu added a comment -

          If everything works fine we should be able to upgrade to jclouds 1.2.1 today and we should be able to simplify this patch (see WHIRR-400).

          Show
          Andrei Savu added a comment - If everything works fine we should be able to upgrade to jclouds 1.2.1 today and we should be able to simplify this patch (see WHIRR-400 ).
          Hide
          Tom White added a comment -

          +1 looks good to me.

          Show
          Tom White added a comment - +1 looks good to me.
          Hide
          Karel Vervaeke added a comment -

          This update adds a testcase which runs all hadoop-* roles on separate nodes (1 node per role).

          Show
          Karel Vervaeke added a comment - This update adds a testcase which runs all hadoop-* roles on separate nodes (1 node per role).
          Hide
          Karel Vervaeke added a comment -

          The patch includes a test for the single node case which works.
          It does not include a test for the separate nodes case. Separate nodes doesn't work with this patch (the jobtracker sometimes (always?) fails to come up - probably just a timing issue).
          See HadoopServiceTest / HadoopSingleNodeServiceTest.

          Show
          Karel Vervaeke added a comment - The patch includes a test for the single node case which works. It does not include a test for the separate nodes case. Separate nodes doesn't work with this patch (the jobtracker sometimes (always?) fails to come up - probably just a timing issue). See HadoopServiceTest / HadoopSingleNodeServiceTest.
          Hide
          Karel Vervaeke added a comment -

          Updated patch

          Show
          Karel Vervaeke added a comment - Updated patch
          Hide
          Andrei Savu added a comment -

          Karel do you think we can refresh this for the current trunk? I know you have a busy schedule and I can help with testing & development as needed. Thanks!

          Show
          Andrei Savu added a comment - Karel do you think we can refresh this for the current trunk? I know you have a busy schedule and I can help with testing & development as needed. Thanks!
          Hide
          Andrei Savu added a comment -

          +1 for refreshing and getting this in as soon as possible. We can make the needed changes as we updated to jclouds 1.2.0.

          Show
          Andrei Savu added a comment - +1 for refreshing and getting this in as soon as possible. We can make the needed changes as we updated to jclouds 1.2.0.
          Hide
          Tom White added a comment -

          > I have this idea that you should be able to reconfigure your cluster by re-running whirr launch-cluster

          This makes more sense with WHIRR-294, where you can run the configure script by itself. But I'm not opposed to doing it the way you suggest here, as long as the scripts work when run twice.

          > Apparently Eclipse has a default setting which is non-alphabetical. I wonder if we could generate the right eclipse configuration during mvn eclipse:eclipse (ditto for 2-space indents).

          That would be nice, but in the meantime it's probably easier to change the setting manually.

          > I'll take the other remarks into account for the next patch update (after jclouds 1.2.0).

          I think it's worth pressing on with this patch, and we can remove CreateFileStatement after jclouds 1.2.0 is released and used in Whirr.

          Show
          Tom White added a comment - > I have this idea that you should be able to reconfigure your cluster by re-running whirr launch-cluster This makes more sense with WHIRR-294 , where you can run the configure script by itself. But I'm not opposed to doing it the way you suggest here, as long as the scripts work when run twice. > Apparently Eclipse has a default setting which is non-alphabetical. I wonder if we could generate the right eclipse configuration during mvn eclipse:eclipse (ditto for 2-space indents). That would be nice, but in the meantime it's probably easier to change the setting manually. > I'll take the other remarks into account for the next patch update (after jclouds 1.2.0). I think it's worth pressing on with this patch, and we can remove CreateFileStatement after jclouds 1.2.0 is released and used in Whirr.
          Hide
          Andrei Savu added a comment -

          I have this idea that you should be able to reconfigure your cluster by re-running whirr launch-cluster

          I like this. If we go down this path we should probably consider renaming launch-cluster to something more appropriate that could also allow semantics like adding / removing nodes based on changes made to the .properties file.

          Show
          Andrei Savu added a comment - I have this idea that you should be able to reconfigure your cluster by re-running whirr launch-cluster I like this. If we go down this path we should probably consider renaming launch-cluster to something more appropriate that could also allow semantics like adding / removing nodes based on changes made to the .properties file.
          Hide
          Karel Vervaeke added a comment -

          @asavu:

          • CreateFileStatement is no longer needed when we upgrade to jclouds 1.2.0. I'll update the patch after the upgrade.

          @tomwhite:

          • I have this idea that you should be able to reconfigure your cluster by re-running whirr launch-cluster (this works more or less in byon environments - assuming role assignment is deterministic). I that context a variable check is better than a directory check (to allow reconfiguration).

          Apparently Eclipse has a default setting which is non-alphabetical. I wonder if we could generate the right eclipse configuration during mvn eclipse:eclipse (ditto for 2-space indents).

          I'll take the other remarks into account for the next patch update (after jclouds 1.2.0).

          Show
          Karel Vervaeke added a comment - @asavu: CreateFileStatement is no longer needed when we upgrade to jclouds 1.2.0. I'll update the patch after the upgrade. @tomwhite: I have this idea that you should be able to reconfigure your cluster by re-running whirr launch-cluster (this works more or less in byon environments - assuming role assignment is deterministic). I that context a variable check is better than a directory check (to allow reconfiguration). Apparently Eclipse has a default setting which is non-alphabetical. I wonder if we could generate the right eclipse configuration during mvn eclipse:eclipse (ditto for 2-space indents). I'll take the other remarks into account for the next patch update (after jclouds 1.2.0).
          Hide
          Tom White added a comment -

          This looks good to me. A few comments:

          • In configure_cdh_hadoop.sh CONFIGURE_HADOOP_DONE is never read. I think it would be more robust to check for the existence of a directory to test for previous installation or configuration (e.g. /usr/local/hadoop, /etc/hadoop/conf).
          • HadoopClusterActionHandler#afterConfigure() should be moved to the subclasses since it should not be called by HadoopDataNodeClusterActionHandler and HadoopTaskTrackerClusterActionHandler, and for HadoopNameNodeClusterActionHandler and HadoopJobTrackerClusterActionHandler they should do different things. The local files (config, proxy) should only be written in one place - e.g. by HadoopNameNodeClusterActionHandler.
          • The imports are still shuffled. Can you change your default import order so that "import com..." comes before "import java..."?
          • I think that a Hadoop integration test for the separate nodes case would be useful.
          • It would be nice to support HDFS-only clusters too (i.e. make MapReduce optional), but this can be done separately.
          Show
          Tom White added a comment - This looks good to me. A few comments: In configure_cdh_hadoop.sh CONFIGURE_HADOOP_DONE is never read. I think it would be more robust to check for the existence of a directory to test for previous installation or configuration (e.g. /usr/local/hadoop, /etc/hadoop/conf). HadoopClusterActionHandler#afterConfigure() should be moved to the subclasses since it should not be called by HadoopDataNodeClusterActionHandler and HadoopTaskTrackerClusterActionHandler, and for HadoopNameNodeClusterActionHandler and HadoopJobTrackerClusterActionHandler they should do different things. The local files (config, proxy) should only be written in one place - e.g. by HadoopNameNodeClusterActionHandler. The imports are still shuffled. Can you change your default import order so that "import com..." comes before "import java..."? I think that a Hadoop integration test for the separate nodes case would be useful. It would be nice to support HDFS-only clusters too (i.e. make MapReduce optional), but this can be done separately.
          Hide
          Andrei Savu added a comment -

          I just had a quick look and I have two questions:

          • Are we still going to need CreateFileStatement when we upgrade to jclouds 1.2.0?
          • In HBase090SingleNodeServiceTest.java you are referencing whirr-hbase-0.90-test.properties and not the new file whirr-hbase-0.90-singlenode-test.properties. Is this what you want?

          I didn't add an integration test for the separate nodes case. Do we need/want one?

          I think we should have one to catch any errors later when we are going to do more work on the Hadoop scripts.

          Show
          Andrei Savu added a comment - I just had a quick look and I have two questions: Are we still going to need CreateFileStatement when we upgrade to jclouds 1.2.0? In HBase090SingleNodeServiceTest.java you are referencing whirr-hbase-0.90-test.properties and not the new file whirr-hbase-0.90-singlenode-test.properties. Is this what you want? I didn't add an integration test for the separate nodes case. Do we need/want one? I think we should have one to catch any errors later when we are going to do more work on the Hadoop scripts.
          Hide
          Karel Vervaeke added a comment -

          This patch makes it possible to run everything on a single node and to run everything on separate nodes.

          I didn't add an integration test for the separate nodes case. Do we need/want one?

          Show
          Karel Vervaeke added a comment - This patch makes it possible to run everything on a single node and to run everything on separate nodes. I didn't add an integration test for the separate nodes case. Do we need/want one?
          Hide
          Karel Vervaeke added a comment -

          Created a pull request for jclouds.
          https://github.com/jclouds/jclouds/pull/85

          I changed the implementation a bit to make it a better fit with jclouds (IMHO).

          Show
          Karel Vervaeke added a comment - Created a pull request for jclouds. https://github.com/jclouds/jclouds/pull/85 I changed the implementation a bit to make it a better fit with jclouds (IMHO).
          Hide
          Tom White added a comment -

          +1 looks good. There is some minor import statement shuffling that shouldn't really be in the patch.

          > It would be better if CreateFileStatement was added in jclouds (in that form or in another form).

          Is it worth opening a jclouds issue for this?

          Show
          Tom White added a comment - +1 looks good. There is some minor import statement shuffling that shouldn't really be in the patch. > It would be better if CreateFileStatement was added in jclouds (in that form or in another form). Is it worth opening a jclouds issue for this?
          Hide
          Karel Vervaeke added a comment -

          Attached patch solves the issue of having multiple roles on a single node (hadoop-* + hbase-*).

          I can't look into splitting namenode+jobtracker right now.

          It would be better if CreateFileStatement was added in jclouds (in that form or in another form).

          Show
          Karel Vervaeke added a comment - Attached patch solves the issue of having multiple roles on a single node (hadoop-* + hbase-*). I can't look into splitting namenode+jobtracker right now. It would be better if CreateFileStatement was added in jclouds (in that form or in another form).
          Hide
          Andrei Savu added a comment -

          Moving to 0.7.0 so that we can look for a better fix. We should either find a way of removing the limitation or enforce it with strict config validation.

          Show
          Andrei Savu added a comment - Moving to 0.7.0 so that we can look for a better fix. We should either find a way of removing the limitation or enforce it with strict config validation.
          Hide
          Andrei Savu added a comment -

          We should start by at least documenting this behavior until we have a patch.

          Show
          Andrei Savu added a comment - We should start by at least documenting this behavior until we have a patch.

            People

            • Assignee:
              Karel Vervaeke
              Reporter:
              Bruno Dumon
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development