Whirr
  1. Whirr
  2. WHIRR-87

Parallelize Hadoop cluster creation

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.3.0
    • Component/s: service/hadoop
    • Labels:
      None

      Description

      Currently the code starts a master node followed by a set of worker nodes. It would be significantly faster to start up all the nodes in one go, then configure them in parallel.

      1. WHIRR-87.patch
        84 kB
        Tom White
      2. WHIRR-87.patch
        38 kB
        Tom White
      3. WHIRR-87.patch
        24 kB
        Tom White
      4. WHIRR-87.patch
        45 kB
        Tom White
      5. WHIRR-87.patch
        35 kB
        Tom White

        Activity

        Hide
        Tom White added a comment -

        I've just committed this.

        Show
        Tom White added a comment - I've just committed this.
        Hide
        Tom White added a comment -

        This latest patch works with both Apache Hadoop and CDH. There's now a test for each. I'd like to commit it soon, since it has a tendency to fall out of date easily.

        Show
        Tom White added a comment - This latest patch works with both Apache Hadoop and CDH. There's now a test for each. I'd like to commit it soon, since it has a tendency to fall out of date easily.
        Hide
        Patrick Hunt added a comment -

        Not a blocker for 0.2.0, pushing to 0.3.0.

        Show
        Patrick Hunt added a comment - Not a blocker for 0.2.0, pushing to 0.3.0.
        Hide
        Tom White added a comment -

        Regenerated against trunk. Not yet tested.

        Show
        Tom White added a comment - Regenerated against trunk. Not yet tested.
        Hide
        Tom White added a comment -

        No, there's no connection - I was just having problems running the tests. It's been OK since then. The patch no longer applies, but I'd like to get WHIRR-106 and WHIRR-101 in before redoing this one.

        Show
        Tom White added a comment - No, there's no connection - I was just having problems running the tests. It's been OK since then. The patch no longer applies, but I'd like to get WHIRR-106 and WHIRR-101 in before redoing this one.
        Hide
        Adrian Cole added a comment -

        the unknownhostexception is distracting, as it isn't related to this change. That said, there is an enhancement to jclouds logged to restrict the view to only certain regions: http://code.google.com/p/jclouds/issues/detail?id=367

        I'd suggest decoupling this issue from the network optimization issue unless there's some connection between the two I'm missing

        Show
        Adrian Cole added a comment - the unknownhostexception is distracting, as it isn't related to this change. That said, there is an enhancement to jclouds logged to restrict the view to only certain regions: http://code.google.com/p/jclouds/issues/detail?id=367 I'd suggest decoupling this issue from the network optimization issue unless there's some connection between the two I'm missing
        Hide
        Tom White added a comment -

        Updated patch for beta-7. I'm having problems running it though, I got:

        Running org.apache.whirr.service.hadoop.integration.HadoopServiceTest
        java.net.UnknownHostException: ec2.ap-southeast-1.amazonaws.com
             at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:177)
             at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:432)
             at java.net.Socket.connect(Socket.java:529)
             at com.sun.net.ssl.internal.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:550)
             at com.sun.net.ssl.internal.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:141)
             at sun.net.NetworkClient.doConnect(NetworkClient.java:163)
             at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
             at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
             at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:272)
             at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:329)
             at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:172)
             at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:801)
             at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:158)
             at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:904)
             at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:230)
             at org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.convert(JavaUrlHttpCommandExecutorService.java:221)
             at org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.convert(JavaUrlHttpCommandExecutorService.java:76)
             at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:153)
             at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:132)
             at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
             at java.util.concurrent.FutureTask.run(FutureTask.java:138)
             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
             at java.lang.Thread.run(Thread.java:637)
        

        And on a later run I got the same for ec2.us-west-1.amazonaws.com. Any ideas what's happening here?

        Show
        Tom White added a comment - Updated patch for beta-7. I'm having problems running it though, I got: Running org.apache.whirr.service.hadoop.integration.HadoopServiceTest java.net.UnknownHostException: ec2.ap-southeast-1.amazonaws.com at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:177) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:432) at java.net.Socket.connect(Socket.java:529) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:550) at com.sun.net.ssl.internal.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:141) at sun.net.NetworkClient.doConnect(NetworkClient.java:163) at sun.net.www.http.HttpClient.openServer(HttpClient.java:394) at sun.net.www.http.HttpClient.openServer(HttpClient.java:529) at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:272) at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:329) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:172) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:801) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:158) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:904) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:230) at org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.convert(JavaUrlHttpCommandExecutorService.java:221) at org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.convert(JavaUrlHttpCommandExecutorService.java:76) at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:153) at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:132) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:637) And on a later run I got the same for ec2.us-west-1.amazonaws.com . Any ideas what's happening here?
        Hide
        Tom White added a comment -

        Thanks for the review, Adrian. Thinking about it more, WHIRR-90 should be committed before this one, since it separates the scripts into versioned directories on the server.

        Show
        Tom White added a comment - Thanks for the review, Adrian. Thinking about it more, WHIRR-90 should be committed before this one, since it separates the scripts into versioned directories on the server.
        Hide
        Adrian Cole added a comment -

        +1 nice pipelining of tasks

        once we move to jclouds 1.0-beta-7, you'll be able to get an executorService from the compute context

        Show
        Adrian Cole added a comment - +1 nice pipelining of tasks once we move to jclouds 1.0-beta-7, you'll be able to get an executorService from the compute context
        Hide
        Tom White added a comment -

        I'd like to commit this soon unless there are any objections.

        Show
        Tom White added a comment - I'd like to commit this soon unless there are any objections.
        Hide
        Tom White added a comment -

        To help test this, I uploaded the scripts to a subdirectory in the whirr bucket, so you can do -Dwhirr.runurl.base=http://whirr.s3.amazonaws.com/WHIRR-87/.

        Show
        Tom White added a comment - To help test this, I uploaded the scripts to a subdirectory in the whirr bucket, so you can do -Dwhirr.runurl.base= http://whirr.s3.amazonaws.com/WHIRR-87/ .
        Hide
        Tom White added a comment -

        The previous patch was missing a new file.

        With the patch the run time for the Hadoop test comes down from 7 minutes 42 seconds to 5 minutes 28 seconds for me (in one run). I had to set -Dwhirr.runurl.base on the command line to override the scripts to my own S3 bucket with the new versions of the scripts.

        Show
        Tom White added a comment - The previous patch was missing a new file. With the patch the run time for the Hadoop test comes down from 7 minutes 42 seconds to 5 minutes 28 seconds for me (in one run). I had to set -Dwhirr.runurl.base on the command line to override the scripts to my own S3 bucket with the new versions of the scripts.
        Hide
        Tom White added a comment -

        Here's a patch which uses an ExecutorService to start master and worker nodes in parallel.

        Show
        Tom White added a comment - Here's a patch which uses an ExecutorService to start master and worker nodes in parallel.

          People

          • Assignee:
            Tom White
            Reporter:
            Tom White
          • Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development