Hadoop Common
  1. Hadoop Common
  2. HADOOP-952

Create a public (shared) Hadoop EC2 AMI

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.2
    • Fix Version/s: 0.12.0
    • Component/s: scripts
    • Labels:
      None

      Description

      HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

      1. ec2-ami-bin.tar
        4 kB
        James P. White
      2. ec2-ami-bin-v2.tar
        4 kB
        James P. White
      3. hadoop-952.patch
        8 kB
        Tom White
      4. hadoop-952-jim.patch
        14 kB
        James P. White
      5. hadoop-952-jim-v2.patch
        14 kB
        James P. White
      6. hadoop-952-v2.patch
        9 kB
        Tom White
      7. hadoop-952-v3.patch
        9 kB
        Tom White
      8. hadoop-952-v4.patch
        16 kB
        Tom White
      9. hadoop-952-v4.tar
        30 kB
        Tom White

        Issue Links

          Activity

          Hide
          Doug Cutting added a comment -

          This would be great to have! Someone would need to donate the S3 storage for these images, but that should be pretty cheap.

          Show
          Doug Cutting added a comment - This would be great to have! Someone would need to donate the S3 storage for these images, but that should be pretty cheap.
          Hide
          Tom White added a comment -

          I was planning on using my S3 storage - at least until the AMI got too popular

          Show
          Tom White added a comment - I was planning on using my S3 storage - at least until the AMI got too popular
          Hide
          Tom White added a comment -

          This patch includes changes to the EC2 scripts to support creation of public AMIs. The main changes are to do with tightening up security - there is a good checklist at http://docs.amazonwebservices.com/AmazonEC2/dg/2006-10-01/public-ami-guidelines.html. The important thing is to clear out keys before bundling the image. Also, since the hadoop AMIs were previously private it was OK to create a new SSH key for the cluster and embed it in the image - this is now a big no no, since it would allow people to connect to someone else's cluster! Instead, your EC2 keypair is used for password-less logins across the cluster.

          Before publishing some images, it would be great if someone could test a private image I have created and sanity check the setup. I'll grant access using the mechanism described here: http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured. So, if you have an EC2 account and would like to help please email me (off list) with your AWS account ID (note this is not either of your access keys).

          After this I'll create a public image.

          Show
          Tom White added a comment - This patch includes changes to the EC2 scripts to support creation of public AMIs. The main changes are to do with tightening up security - there is a good checklist at http://docs.amazonwebservices.com/AmazonEC2/dg/2006-10-01/public-ami-guidelines.html . The important thing is to clear out keys before bundling the image. Also, since the hadoop AMIs were previously private it was OK to create a new SSH key for the cluster and embed it in the image - this is now a big no no, since it would allow people to connect to someone else's cluster! Instead, your EC2 keypair is used for password-less logins across the cluster. Before publishing some images, it would be great if someone could test a private image I have created and sanity check the setup. I'll grant access using the mechanism described here: http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured . So, if you have an EC2 account and would like to help please email me (off list) with your AWS account ID (note this is not either of your access keys). After this I'll create a public image.
          Hide
          Tom White added a comment -

          Actually, it is not true that embedding a private SSH key in the image would allow people to connect to other clusters: the cluster runs in a security group that only allows other machines in the cluster or a given owner to connect. (See the ec2-authorize command.) However, it is still a bad idea to embed a private SSH key in an image, in case people get their security groups misconfigured.

          Show
          Tom White added a comment - Actually, it is not true that embedding a private SSH key in the image would allow people to connect to other clusters: the cluster runs in a security group that only allows other machines in the cluster or a given owner to connect. (See the ec2-authorize command.) However, it is still a bad idea to embed a private SSH key in an image, in case people get their security groups misconfigured.
          Hide
          James P. White added a comment -

          Hi Tom!

          You wrote:

          > ...
          > Any problems or questions, give me a shout! (Let me know how it goes
          > anyway.)

          I've gotten setup on EC2 and gave your image a whirl.

          The biggest problem I had was figuring out the S3_BUCKET.

          I got HADOOP_VERSION wrong a couple times.

          I also spent a while getting the EC2_KEYDIR and SSH_OPTS set to use my scheme.

          These are the settings I wound up with:

          1. The Amazon S3 bucket where the Hadoop AMI you create will be stored.
            S3_BUCKET=hadoop-ec2-images
          1. Location of EC2 keys.
          2. The default setting is probably OK if you set up EC2 following the Amazon Getting Started guide.
            EC2_KEYDIR=`dirname "$EC2_PRIVATE_KEY"`
          1. SSH options used when connecting to EC2 instances.
          2. Change the -i option to be the absolute path to your keypair that you set up in the Amazon Getting Started guide.
            SSH_OPTS=`echo -i "$EC2_KEYDIR"/id_rsa-gsg-keypair -o StrictHostKeyChecking=no`
          1. The download URL for the Sun JDK. Visit http://java.sun.com/javase/downloads/index_jdk5.jsp and get the URL for the "Linux self-extracting file".
            JAVA_BINARY_URL=''
          1. The version number of the installed JDK.
            JAVA_VERSION=1.5.0_11
          1. The EC2 group to run your cluster in.
            GROUP=hadoop-cluster-group
          1. The version of Hadoop to install.
            HADOOP_VERSION=0.11.0

          I think those are somewhat better defaults. The others are much more self-explanatory.

          I also had to rerun the run-cluster code following the "Waiting before ..." point multiple times to get the settings worked out, so I made a shortened version (rerun-). I also made a login script (which turns out to be a good test before doing the "Creating instances... business").

          I then tried to run the pi sample job per the wiki page, but get an exception:

          [root@domU-12-31-34-00-03-2F ~]# cd /usr/local/hadoop-0.11.0/
          [root@domU-12-31-34-00-03-2F hadoop-0.11.0]# bin/hadoop jar hadoop-0.11.0-examples.jar pi 10 10000000
          Number of Maps = 10 Samples per Map = 10000000
          org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.ArithmeticException: / by zero
          at org.apache.hadoop.dfs.FSNamesystem$Replicator.chooseTarget(FSNamesystem.java:2593)
          at org.apache.hadoop.dfs.FSNamesystem$Replicator.chooseTarget(FSNamesystem.java:2555)
          at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:684)
          at org.apache.hadoop.dfs.NameNode.create(NameNode.java:248)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at java.lang.reflect.Method.invoke(Method.java:585)
          at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:337)
          at org.apache.hadoop.ipc.Server$Handler.run(Server.java:538)

          at org.apache.hadoop.ipc.Client.call(Client.java:467)
          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164)
          at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)
          at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateNewBlock(DFSClient.java:1091)
          at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:1031)
          at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1255)
          at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1345)
          at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
          at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
          at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
          at org.apache.hadoop.fs.FSDataOutputStream$Summer.close(FSDataOutputStream.java:98)
          at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
          at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
          at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
          at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:724)
          at org.apache.hadoop.examples.PiEstimator.launch(PiEstimator.java:185)
          at org.apache.hadoop.examples.PiEstimator.main(PiEstimator.java:226)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at java.lang.reflect.Method.invoke(Method.java:585)
          at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
          at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:143)
          at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:40)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at java.lang.reflect.Method.invoke(Method.java:585)
          at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
          [root@domU-12-31-34-00-03-2F hadoop-0.11.0]#

          Show
          James P. White added a comment - Hi Tom! You wrote: > ... > Any problems or questions, give me a shout! (Let me know how it goes > anyway.) I've gotten setup on EC2 and gave your image a whirl. The biggest problem I had was figuring out the S3_BUCKET. I got HADOOP_VERSION wrong a couple times. I also spent a while getting the EC2_KEYDIR and SSH_OPTS set to use my scheme. These are the settings I wound up with: The Amazon S3 bucket where the Hadoop AMI you create will be stored. S3_BUCKET=hadoop-ec2-images Location of EC2 keys. The default setting is probably OK if you set up EC2 following the Amazon Getting Started guide. EC2_KEYDIR=`dirname "$EC2_PRIVATE_KEY"` SSH options used when connecting to EC2 instances. Change the -i option to be the absolute path to your keypair that you set up in the Amazon Getting Started guide. SSH_OPTS=`echo -i "$EC2_KEYDIR"/id_rsa-gsg-keypair -o StrictHostKeyChecking=no` The download URL for the Sun JDK. Visit http://java.sun.com/javase/downloads/index_jdk5.jsp and get the URL for the "Linux self-extracting file". JAVA_BINARY_URL='' The version number of the installed JDK. JAVA_VERSION=1.5.0_11 The EC2 group to run your cluster in. GROUP=hadoop-cluster-group The version of Hadoop to install. HADOOP_VERSION=0.11.0 I think those are somewhat better defaults. The others are much more self-explanatory. I also had to rerun the run-cluster code following the "Waiting before ..." point multiple times to get the settings worked out, so I made a shortened version (rerun-). I also made a login script (which turns out to be a good test before doing the "Creating instances... business"). I then tried to run the pi sample job per the wiki page, but get an exception: [root@domU-12-31-34-00-03-2F ~] # cd /usr/local/hadoop-0.11.0/ [root@domU-12-31-34-00-03-2F hadoop-0.11.0] # bin/hadoop jar hadoop-0.11.0-examples.jar pi 10 10000000 Number of Maps = 10 Samples per Map = 10000000 org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.ArithmeticException: / by zero at org.apache.hadoop.dfs.FSNamesystem$Replicator.chooseTarget(FSNamesystem.java:2593) at org.apache.hadoop.dfs.FSNamesystem$Replicator.chooseTarget(FSNamesystem.java:2555) at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:684) at org.apache.hadoop.dfs.NameNode.create(NameNode.java:248) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:337) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:538) at org.apache.hadoop.ipc.Client.call(Client.java:467) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164) at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateNewBlock(DFSClient.java:1091) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:1031) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1255) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1345) at java.io.FilterOutputStream.close(FilterOutputStream.java:143) at java.io.FilterOutputStream.close(FilterOutputStream.java:143) at java.io.FilterOutputStream.close(FilterOutputStream.java:143) at org.apache.hadoop.fs.FSDataOutputStream$Summer.close(FSDataOutputStream.java:98) at java.io.FilterOutputStream.close(FilterOutputStream.java:143) at java.io.FilterOutputStream.close(FilterOutputStream.java:143) at java.io.FilterOutputStream.close(FilterOutputStream.java:143) at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:724) at org.apache.hadoop.examples.PiEstimator.launch(PiEstimator.java:185) at org.apache.hadoop.examples.PiEstimator.main(PiEstimator.java:226) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:143) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:40) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.util.RunJar.main(RunJar.java:155) [root@domU-12-31-34-00-03-2F hadoop-0.11.0] #
          Hide
          Tom White added a comment -

          Jim,

          Thanks for giving the scripts a whirl. Looks like you may have been using the unpatched scripts as the patch fixes the variables pretty much in the way you suggest (sorry if I wasn't clearer in explaining how to use them). Nevertheless I've improved the wording in the set up file further, and I've included your handy login script in a new patch. (I didn't include your rerun script, as I hope this won't be needed too much.)

          The ArithmeticException is a mystery to me as I haven't been able to reproduce it, which is odd given that we are running the same AMI. Could you run any of the other examples? Also it might be worth looking in the log files to see if anything else failed.

          Show
          Tom White added a comment - Jim, Thanks for giving the scripts a whirl. Looks like you may have been using the unpatched scripts as the patch fixes the variables pretty much in the way you suggest (sorry if I wasn't clearer in explaining how to use them). Nevertheless I've improved the wording in the set up file further, and I've included your handy login script in a new patch. (I didn't include your rerun script, as I hope this won't be needed too much.) The ArithmeticException is a mystery to me as I haven't been able to reproduce it, which is odd given that we are running the same AMI. Could you run any of the other examples? Also it might be worth looking in the log files to see if anything else failed.
          Hide
          Tom White added a comment -

          New patch (v3) that fixes a security hole uncovered by Jim. There still seems to be a problem in Jim's setup which is producing an ArithmeticException (see HADOOP-1013).

          Show
          Tom White added a comment - New patch (v3) that fixes a security hole uncovered by Jim. There still seems to be a problem in Jim's setup which is producing an ArithmeticException (see HADOOP-1013 ).
          Hide
          James P. White added a comment -

          I've applied the v3 patch and tried out the new AMI which successfully ran the "pi" example.

          Had some trouble getting the SSH settings right again. I need env.sh to look like this:

          1. Location of EC2 keys.
          2. The default setting is probably OK if you set up EC2 following the Amazon Getting Started guide.
            EC2_KEYDIR=`dirname "$EC2_PRIVATE_KEY"`
          1. The EC2 key name used to launch instances.
          2. The default is the value used in the Amazon Getting Started guide.
            KEY_NAME=gsg-keypair
          1. Where your EC2 private key is stored (created when following the Amazon Getting Started guide).
            PRIVATE_KEY_PATH=`echo "$EC2_KEYDIR"/"id_rsa-$KEY_NAME"`
          1. SSH options used when connecting to EC2 instances.
            SSH_OPTS=`echo -i "$PRIVATE_KEY_PATH" -o StrictHostKeyChecking=no`

          The reason for the 'echo ...` business is that I need paths with embedded spaces to work.

          Also I really think 'run-hadoop-cluster' should be split in two. The part where it waits for DynDNS to be set up should simply end and have the second part be a seperate script. A user with a new set up would also be advised to run "login-hadoop-cluster" before running the second part to verify the settings.

          Show
          James P. White added a comment - I've applied the v3 patch and tried out the new AMI which successfully ran the "pi" example. Had some trouble getting the SSH settings right again. I need env.sh to look like this: Location of EC2 keys. The default setting is probably OK if you set up EC2 following the Amazon Getting Started guide. EC2_KEYDIR=`dirname "$EC2_PRIVATE_KEY"` The EC2 key name used to launch instances. The default is the value used in the Amazon Getting Started guide. KEY_NAME=gsg-keypair Where your EC2 private key is stored (created when following the Amazon Getting Started guide). PRIVATE_KEY_PATH=`echo "$EC2_KEYDIR"/"id_rsa-$KEY_NAME"` SSH options used when connecting to EC2 instances. SSH_OPTS=`echo -i "$PRIVATE_KEY_PATH" -o StrictHostKeyChecking=no` The reason for the 'echo ...` business is that I need paths with embedded spaces to work. Also I really think 'run-hadoop-cluster' should be split in two. The part where it waits for DynDNS to be set up should simply end and have the second part be a seperate script. A user with a new set up would also be advised to run "login-hadoop-cluster" before running the second part to verify the settings.
          Hide
          James P. White added a comment -

          This patch implements the changes I suggest, including a sanity check at the initialization step that SSH to the master works.

          Show
          James P. White added a comment - This patch implements the changes I suggest, including a sanity check at the initialization step that SSH to the master works.
          Hide
          James P. White added a comment -

          This is a tar of the src/contrib/ec2 directory with my patch applied. This would be helpful to someone who wanted to do the minimum to get started on EC2+Hadoop.

          Show
          James P. White added a comment - This is a tar of the src/contrib/ec2 directory with my patch applied. This would be helpful to someone who wanted to do the minimum to get started on EC2+Hadoop.
          Hide
          James P. White added a comment -

          I was in a rush and hadn't tested the refactored run-hadoop-cluster so of course it was quite broken. The patch for the fixed and tested version is attached.

          Show
          James P. White added a comment - I was in a rush and hadn't tested the refactored run-hadoop-cluster so of course it was quite broken. The patch for the fixed and tested version is attached.
          Hide
          James P. White added a comment -

          Ditto to the above - tar of scripts with jim-v2 patch applied.

          Show
          James P. White added a comment - Ditto to the above - tar of scripts with jim-v2 patch applied.
          Hide
          James P. White added a comment -

          So now the startup scripts are all lovely and I can run the Pi example. Trying to find other tests to run, and I came up with:

          [root@domU-12-31-34-00-02-B4 hadoop-0.11.1]# bin/hadoop jar hadoop-0.11.1-test.jar DFSCIOTest -write
          DFSCIOTest.0.0.1
          07/02/14 01:24:03 INFO mapred.InputFormatBase: nrFiles = 1
          07/02/14 01:24:03 INFO mapred.InputFormatBase: fileSize (MB) = 1
          07/02/14 01:24:03 INFO mapred.InputFormatBase: bufferSize = 1000000
          /usr/local/hadoop-0.11.1/libhdfs/libhdfs.so.1: No such file or directory

          That looks like some LIBPATH problem.

          Show
          James P. White added a comment - So now the startup scripts are all lovely and I can run the Pi example. Trying to find other tests to run, and I came up with: [root@domU-12-31-34-00-02-B4 hadoop-0.11.1] # bin/hadoop jar hadoop-0.11.1-test.jar DFSCIOTest -write DFSCIOTest.0.0.1 07/02/14 01:24:03 INFO mapred.InputFormatBase: nrFiles = 1 07/02/14 01:24:03 INFO mapred.InputFormatBase: fileSize (MB) = 1 07/02/14 01:24:03 INFO mapred.InputFormatBase: bufferSize = 1000000 /usr/local/hadoop-0.11.1/libhdfs/libhdfs.so.1: No such file or directory That looks like some LIBPATH problem.
          Hide
          Tom White added a comment -

          This v4 patch applies all of Jim's changes, but with the following differences:

          • Re-instate some changes in the create-hadoop-image script that had been lost.
          • Rename start-hadoop-cluster to launch-hadoop-cluster, and init-hadoop-cluster to start-hadoop.
          • Add a top-level script, hadoop-ec2, for running commands, and which provides simple usage instructions.

          Once this is committed, I will make the AMIs public and update the wiki instructions.

          Show
          Tom White added a comment - This v4 patch applies all of Jim's changes, but with the following differences: Re-instate some changes in the create-hadoop-image script that had been lost. Rename start-hadoop-cluster to launch-hadoop-cluster, and init-hadoop-cluster to start-hadoop. Add a top-level script, hadoop-ec2, for running commands, and which provides simple usage instructions. Once this is committed, I will make the AMIs public and update the wiki instructions.
          Hide
          Tom White added a comment -

          A corresponding tar file of changes: hadoop-952-v4.tar.

          Show
          Tom White added a comment - A corresponding tar file of changes: hadoop-952-v4.tar.
          Hide
          James P. White added a comment -

          I tried out the v4-tar scripts and they work fine. The choice of the "start-" prefix was reflecting that the cluster nodes were being started, but obviously your choice is fine.

          One thing that might be worthwhile adding is a HADOOP_HOME setting in the AMI perhaps with a PATH update too. That could be with a "/root/hadoop-env.sh" file and/or a "/usr/local/hadoop-current" symlink or the like.

          I see how the HADOOP_VERSION in the local env.sh works and selects the right AMI, and maybe avoiding the duplicate settings is the right thing, but with the way it is now it doesn't "feel" like Hadoop is "installed" in the AMI. But since I'm a cluster newbie. this may be something I'll change my mind on.

          And speaking of that, the Hadoop version thing being in those jar file names seems like a problem too.

          Any notion what's wrong with my attempt to run "bin/hadoop jar hadoop-0.11.1-test.jar DFSCIOTest -write "?

          Show
          James P. White added a comment - I tried out the v4-tar scripts and they work fine. The choice of the "start-" prefix was reflecting that the cluster nodes were being started, but obviously your choice is fine. One thing that might be worthwhile adding is a HADOOP_HOME setting in the AMI perhaps with a PATH update too. That could be with a "/root/hadoop-env.sh" file and/or a "/usr/local/hadoop-current" symlink or the like. I see how the HADOOP_VERSION in the local env.sh works and selects the right AMI, and maybe avoiding the duplicate settings is the right thing, but with the way it is now it doesn't "feel" like Hadoop is "installed" in the AMI. But since I'm a cluster newbie. this may be something I'll change my mind on. And speaking of that, the Hadoop version thing being in those jar file names seems like a problem too. Any notion what's wrong with my attempt to run "bin/hadoop jar hadoop-0.11.1-test.jar DFSCIOTest -write "?
          Hide
          Tom White added a comment -

          Jim. Glad the scripts work for you. Would you be happy for the changes to be committed? I feel the enhancements you mention belong in another Jira. (I've been thinking about how to manage various versions of Hadoop AMIs, so it would be good to take this further.)

          I'll look at the DFSCIOTest problem too.

          Show
          Tom White added a comment - Jim. Glad the scripts work for you. Would you be happy for the changes to be committed? I feel the enhancements you mention belong in another Jira. (I've been thinking about how to manage various versions of Hadoop AMIs, so it would be good to take this further.) I'll look at the DFSCIOTest problem too.
          Hide
          James P. White added a comment -

          +1 on committing. Dealing with patches makes this code hard to work on.

          Issues are cheap (especially when they get closed), so opening new ones for enhancements and the like is probably a good idea. I like the project's approach of keeping most discussion in Jira (because posts on the list will be lost to future developer/users whereas Jira issues keep it all together). The unclear bit is when it makes sense to put somewhat-off-the-issue's-topic-but-related in a different issue or list posting.

          So whenever you're ready to close this issue and move on is fine with me.

          Show
          James P. White added a comment - +1 on committing. Dealing with patches makes this code hard to work on. Issues are cheap (especially when they get closed), so opening new ones for enhancements and the like is probably a good idea. I like the project's approach of keeping most discussion in Jira (because posts on the list will be lost to future developer/users whereas Jira issues keep it all together). The unclear bit is when it makes sense to put somewhat-off-the-issue's-topic-but-related in a different issue or list posting. So whenever you're ready to close this issue and move on is fine with me.
          Hide
          Doug Cutting added a comment -

          +1 This looks good to me.

          Show
          Doug Cutting added a comment - +1 This looks good to me.
          Hide
          Tom White added a comment -

          I've just committed this.

          Show
          Tom White added a comment - I've just committed this.

            People

            • Assignee:
              Tom White
              Reporter:
              Tom White
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development