Whirr
  1. Whirr
  2. WHIRR-55

Users should be able to override an arbitrary Hadoop property before launch

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4.0
    • Component/s: service/hadoop
    • Labels:
      None
    1. WHIRR-55.patch
      35 kB
      Tom White
    2. WHIRR-55.patch
      35 kB
      Tom White
    3. WHIRR-55.patch
      35 kB
      Tom White
    4. WHIRR-55.patch
      35 kB
      Tom White
    5. WHIRR-55.patch
      44 kB
      Tom White

      Issue Links

        Activity

        Hide
        Erik Frey added a comment -

        Would this include, for example, dfs replication setting? If so, yes this improvement would be very helpful!

        Show
        Erik Frey added a comment - Would this include, for example, dfs replication setting? If so, yes this improvement would be very helpful!
        Hide
        Tom White added a comment -

        Yes, it would include exactly that. Are you interested in implementing it?

        Show
        Tom White added a comment - Yes, it would include exactly that. Are you interested in implementing it?
        Hide
        Tom White added a comment -

        Here's an initial attempt at this (for the Java implementation). Configuration is generated by a HadoopConfigurationBuilder, and is pushed to a file on cluster nodes using jclouds' Statements.createFile call.

        HadoopConfigurationBuilder takes care of dynamic properties like fs.default.name and mapred.job.tracker which depend on the cluster object. It may be extended in future to set mapred.reduce.tasks according to the number of slots in the cluster, or mapred.tasktracker.

        {map,reduce}

        .tasks.maximum according to the number of CPUs on each instance.

        Properties may be overridden by specifying them in the Whirr configuration. For example, to override Hadoop's dfs.replication property to 2 you would add

        hadoop-hdfs.dfs.replication=2
        

        to your Whirr properties file. The hadoop-hdfs prefix signifies that the property should go in hdfs-site.xml. (This patch also incorporates WHIRR-149.)

        As a simplification, this patch also removes the webserver running on the namenode, since the URLs for the namenode and jobtracker are now logged explicitly:

        Namenode web UI available at http://ec2-184-73-89-144.compute-1.amazonaws.com:50070
        Jobtracker web UI available at http://ec2-184-73-89-144.compute-1.amazonaws.com:50030
        

        so you can go directly to the web UIs.

        Show
        Tom White added a comment - Here's an initial attempt at this (for the Java implementation). Configuration is generated by a HadoopConfigurationBuilder, and is pushed to a file on cluster nodes using jclouds' Statements.createFile call. HadoopConfigurationBuilder takes care of dynamic properties like fs.default.name and mapred.job.tracker which depend on the cluster object. It may be extended in future to set mapred.reduce.tasks according to the number of slots in the cluster, or mapred.tasktracker. {map,reduce} .tasks.maximum according to the number of CPUs on each instance. Properties may be overridden by specifying them in the Whirr configuration. For example, to override Hadoop's dfs.replication property to 2 you would add hadoop-hdfs.dfs.replication=2 to your Whirr properties file. The hadoop-hdfs prefix signifies that the property should go in hdfs-site.xml. (This patch also incorporates WHIRR-149 .) As a simplification, this patch also removes the webserver running on the namenode, since the URLs for the namenode and jobtracker are now logged explicitly: Namenode web UI available at http://ec2-184-73-89-144.compute-1.amazonaws.com:50070 Jobtracker web UI available at http://ec2-184-73-89-144.compute-1.amazonaws.com:50030 so you can go directly to the web UIs.
        Hide
        Tom White added a comment -

        Synced patch with trunk.

        Show
        Tom White added a comment - Synced patch with trunk.
        Hide
        Andrei Savu added a comment -

        Looks great!

        One small issue: you should create ClusterSpec instances by using the factory methods ClusterSpec.withTemporaryKeys or ClusterSpec.withNoDefaults in tests to avoid re-adding the dependency on .ssh/id_rsa.

        All unit tests are passing for me. Unfortunately I haven't been able to run the hadoop integration tests. They are failing with the following errors

        channel 2: open failed: connect failed: Connection refused
        
        -------------------------------------------------------------------------------
        Test set: org.apache.whirr.service.hadoop.integration.HadoopServiceTest
        -------------------------------------------------------------------------------
        Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 449.269 sec <<< FAILURE!
        org.apache.whirr.service.hadoop.integration.HadoopServiceTest  Time elapsed: 0 sec  <<< ERROR!
        java.io.IOException: Call to ec2-50-16-4-0.compute-1.amazonaws.com/50.16.4.0:8021 failed on local exception: java.net.SocketException: Malformed reply from SOCKS server
          at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
          at org.apache.hadoop.ipc.Client.call(Client.java:743)
          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
          at org.apache.hadoop.mapred.$Proxy76.getProtocolVersion(Unknown Source)
          at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
          at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
          at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
          at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
          at org.apache.whirr.service.hadoop.integration.HadoopServiceController.startup(HadoopServiceController.java:89)
          at org.apache.whirr.service.hadoop.integration.HadoopServiceController.ensureClusterRunning(HadoopServiceController.java:68)
          at org.apache.whirr.service.hadoop.integration.HadoopServiceTest.setUp(HadoopServiceTest.java:54)
          ...
        

        Is this only happening to me (I've seen integration tests failing thanks to internet connectivity issues - I have tried multiple times)?

        Show
        Andrei Savu added a comment - Looks great! One small issue: you should create ClusterSpec instances by using the factory methods ClusterSpec.withTemporaryKeys or ClusterSpec.withNoDefaults in tests to avoid re-adding the dependency on .ssh/id_rsa . All unit tests are passing for me. Unfortunately I haven't been able to run the hadoop integration tests. They are failing with the following errors channel 2: open failed: connect failed: Connection refused ------------------------------------------------------------------------------- Test set: org.apache.whirr.service.hadoop.integration.HadoopServiceTest ------------------------------------------------------------------------------- Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 449.269 sec <<< FAILURE! org.apache.whirr.service.hadoop.integration.HadoopServiceTest Time elapsed: 0 sec <<< ERROR! java.io.IOException: Call to ec2-50-16-4-0.compute-1.amazonaws.com/50.16.4.0:8021 failed on local exception: java.net.SocketException: Malformed reply from SOCKS server at org.apache.hadoop.ipc.Client.wrapException(Client.java:775) at org.apache.hadoop.ipc.Client.call(Client.java:743) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at org.apache.hadoop.mapred.$Proxy76.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423) at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410) at org.apache.whirr.service.hadoop.integration.HadoopServiceController.startup(HadoopServiceController.java:89) at org.apache.whirr.service.hadoop.integration.HadoopServiceController.ensureClusterRunning(HadoopServiceController.java:68) at org.apache.whirr.service.hadoop.integration.HadoopServiceTest.setUp(HadoopServiceTest.java:54) ... Is this only happening to me (I've seen integration tests failing thanks to internet connectivity issues - I have tried multiple times)?
        Hide
        Tom White added a comment -

        > you should create ClusterSpec instances by using the factory methods ClusterSpec.withTemporaryKeys or ClusterSpec.withNoDefaults in tests to avoid re-adding the dependency on .ssh/id_rsa.

        I'll produce a new patch for this.

        I've been running Hadoop integration tests OK, but I haven't run these yet. The CDH side of this patch needs doing too. I'm tempted to leave this out of 0.3.0, but would like to hear thoughts from others.

        Show
        Tom White added a comment - > you should create ClusterSpec instances by using the factory methods ClusterSpec.withTemporaryKeys or ClusterSpec.withNoDefaults in tests to avoid re-adding the dependency on .ssh/id_rsa. I'll produce a new patch for this. I've been running Hadoop integration tests OK, but I haven't run these yet. The CDH side of this patch needs doing too. I'm tempted to leave this out of 0.3.0, but would like to hear thoughts from others.
        Hide
        Tom White added a comment -

        New patch for trunk addressing Andrei's comments. Haven't tested it yet. Also, still need to change CDH code.

        Show
        Tom White added a comment - New patch for trunk addressing Andrei's comments. Haven't tested it yet. Also, still need to change CDH code.
        Hide
        Tibor Kiss added a comment -

        Right now I was rebasing from the trunk to rebuild the WHIRR-167 patch and when I was running the integration tests for Hadoop, it fail in the same way as for Andrei. Is this patch applied to trunk or not yet?

        Show
        Tibor Kiss added a comment - Right now I was rebasing from the trunk to rebuild the WHIRR-167 patch and when I was running the integration tests for Hadoop, it fail in the same way as for Andrei. Is this patch applied to trunk or not yet?
        Hide
        Andrei Savu added a comment -

        This patch it's not applied to the trunk. I have just tried multiple times (even using different internet connections and cloud providers) to run the integration tests for cdh and hadoop and they always fail with the same error message:

        -------------------------------------------------------------------------------
        Test set: org.apache.whirr.service.cdh.integration.CdhHadoopServiceTest
        -------------------------------------------------------------------------------
        Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 336.63 sec <<< FAILURE!
        test(org.apache.whirr.service.cdh.integration.CdhHadoopServiceTest)  Time elapsed: 336.53 sec  <<< ERROR!
        java.io.IOException: Call to ec2-50-16-169-138.compute-1.amazonaws.com/50.16.169.138:8021 failed on local exception: java.net.SocketException: Malformed reply from SOCKS server
          at org.apache.hadoop.ipc.Client.wrapException(Client.java:1089)
          at org.apache.hadoop.ipc.Client.call(Client.java:1057)
          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
          at org.apache.hadoop.mapred.$Proxy76.getProtocolVersion(Unknown Source)
          at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:369)
          at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:486)
          at org.apache.hadoop.mapred.JobClient.init(JobClient.java:471)
          at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:456)
          at org.apache.whirr.service.cdh.integration.CdhHadoopServiceTest.test(CdhHadoopServiceTest.java:87)
        

        I will try to track this down to one of the recently committed patches. We really need to setup a CI server that could run all the suite all the time. It's extremely time consuming to do this on the development machine.

        Show
        Andrei Savu added a comment - This patch it's not applied to the trunk. I have just tried multiple times (even using different internet connections and cloud providers) to run the integration tests for cdh and hadoop and they always fail with the same error message: ------------------------------------------------------------------------------- Test set: org.apache.whirr.service.cdh.integration.CdhHadoopServiceTest ------------------------------------------------------------------------------- Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 336.63 sec <<< FAILURE! test(org.apache.whirr.service.cdh.integration.CdhHadoopServiceTest) Time elapsed: 336.53 sec <<< ERROR! java.io.IOException: Call to ec2-50-16-169-138.compute-1.amazonaws.com/50.16.169.138:8021 failed on local exception: java.net.SocketException: Malformed reply from SOCKS server at org.apache.hadoop.ipc.Client.wrapException(Client.java:1089) at org.apache.hadoop.ipc.Client.call(Client.java:1057) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) at org.apache.hadoop.mapred.$Proxy76.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:369) at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:486) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:471) at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:456) at org.apache.whirr.service.cdh.integration.CdhHadoopServiceTest.test(CdhHadoopServiceTest.java:87) I will try to track this down to one of the recently committed patches. We really need to setup a CI server that could run all the suite all the time. It's extremely time consuming to do this on the development machine.
        Hide
        Tibor Kiss added a comment - - edited

        Thx.
        Last time I have made successful integration tests when I applied my WHIRR-167 patch to the 1059503 revision on the trunk.
        Now we are at 1065812 revision on the trunk. Somewhere in between these revision numbers has been introduced the problem. I'm sure you or somebody else can further reduce the interval of search.

        Even if you have CI server, sometimes is inefficient to run integration tests on each patch you apply. Maybe a few patches together it worst, then if fails further dividing can be applied. The problem to automate the integration test while merging the patches is caused by the fact that sometimes you want to run the integration test before you are commiting. I don't know how this can be solved, or you just commit merge the patches into trunk, then one by one you will wait for the results on CI server? (sorry for offtopic).

        Show
        Tibor Kiss added a comment - - edited Thx. Last time I have made successful integration tests when I applied my WHIRR-167 patch to the 1059503 revision on the trunk. Now we are at 1065812 revision on the trunk. Somewhere in between these revision numbers has been introduced the problem. I'm sure you or somebody else can further reduce the interval of search. Even if you have CI server, sometimes is inefficient to run integration tests on each patch you apply. Maybe a few patches together it worst, then if fails further dividing can be applied. The problem to automate the integration test while merging the patches is caused by the fact that sometimes you want to run the integration test before you are commiting. I don't know how this can be solved, or you just commit merge the patches into trunk, then one by one you will wait for the results on CI server? (sorry for offtopic).
        Hide
        Tom White added a comment -

        > We really need to setup a CI server that could run all the suite all the time.

        Agreed. I've opened WHIRR-228 to discuss this further.

        Show
        Tom White added a comment - > We really need to setup a CI server that could run all the suite all the time. Agreed. I've opened WHIRR-228 to discuss this further.
        Hide
        Tom White added a comment -

        Here's an updated patch that passes integration tests for both Hadoop and CDH.

        (Not to be committed until WHIRR-167 is in.)

        Show
        Tom White added a comment - Here's an updated patch that passes integration tests for both Hadoop and CDH. (Not to be committed until WHIRR-167 is in.)
        Hide
        Andrei Savu added a comment -

        +1 The patch looks good. I have been able to run the integration tests for HBase. I'm not able to check CDH because I'm on a crappy internet connection right now. If CDH is working for you then I believe it's safe to commit.

        Show
        Andrei Savu added a comment - +1 The patch looks good. I have been able to run the integration tests for HBase. I'm not able to check CDH because I'm on a crappy internet connection right now. If CDH is working for you then I believe it's safe to commit.
        Hide
        Tom White added a comment -

        I've just committed this. Thanks Andrei for reviewing. BTW I ran the integration tests successfully for Hadoop and CDH (on AWS and Rackspace).

        Show
        Tom White added a comment - I've just committed this. Thanks Andrei for reviewing. BTW I ran the integration tests successfully for Hadoop and CDH (on AWS and Rackspace).

          People

          • Assignee:
            Tom White
            Reporter:
            Tom White
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development