Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.10.0
    • Component/s: Tools
    • Labels:

      Description

      Hi guys,

      I'd like to contribute EMR bootstrap script for Tajo. With this script, you can easily launch a Tajo cluster on EMR.

        Activity

        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user hys9958 opened a pull request:

        https://github.com/apache/tajo/pull/257

        TAJO-1199: EMR bootstrap script for Tajo

        I'll describe some examples soon.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/hys9958/tajo tajo-1199

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/257.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #257


        commit 67bdc71b17b54da731eaf01b9fac51ad8983353b
        Author: JaeHwa Jung <blrunner@apache.org>
        Date: 2014-11-19T06:46:47Z

        TAJO-1195: Remove unused CachedDNSResolver Class. (DaeMyung Kang via jaehwa)

        Closes #253

        commit 342d842d607e2085281932f0f962a3212ab56522
        Author: hys9958 <hanyounsu@gmail.com>
        Date: 2014-11-19T09:18:13Z

        TAJO-1199 EMR bootstrap script for Tajo

        commit 1c00584f569ef32e40fb399e4f5ab8226217adcd
        Author: hys9958 <hanyounsu@gmail.com>
        Date: 2014-11-19T09:42:34Z

        TAJO-1199 EMR bootstrap script for Tajo

        commit f1d93e48431a9d47b038f6839a194c0722c10a58
        Author: hys9958 <hanyounsu@gmail.com>
        Date: 2014-11-19T10:02:08Z

        TAJO-1199 EMR bootstrap script for Tajo


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user hys9958 opened a pull request: https://github.com/apache/tajo/pull/257 TAJO-1199 : EMR bootstrap script for Tajo I'll describe some examples soon. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hys9958/tajo tajo-1199 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/257.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #257 commit 67bdc71b17b54da731eaf01b9fac51ad8983353b Author: JaeHwa Jung <blrunner@apache.org> Date: 2014-11-19T06:46:47Z TAJO-1195 : Remove unused CachedDNSResolver Class. (DaeMyung Kang via jaehwa) Closes #253 commit 342d842d607e2085281932f0f962a3212ab56522 Author: hys9958 <hanyounsu@gmail.com> Date: 2014-11-19T09:18:13Z TAJO-1199 EMR bootstrap script for Tajo commit 1c00584f569ef32e40fb399e4f5ab8226217adcd Author: hys9958 <hanyounsu@gmail.com> Date: 2014-11-19T09:42:34Z TAJO-1199 EMR bootstrap script for Tajo commit f1d93e48431a9d47b038f6839a194c0722c10a58 Author: hys9958 <hanyounsu@gmail.com> Date: 2014-11-19T10:02:08Z TAJO-1199 EMR bootstrap script for Tajo
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hys9958 closed the pull request at:

        https://github.com/apache/tajo/pull/257

        Show
        githubbot ASF GitHub Bot added a comment - Github user hys9958 closed the pull request at: https://github.com/apache/tajo/pull/257
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user hys9958 opened a pull request:

        https://github.com/apache/tajo/pull/258

        TAJO-1199: EMR bootstrap script for Tajo

        I'll describe some examples soon.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/hys9958/tajo tajo-1199

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/258.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #258


        commit 67bdc71b17b54da731eaf01b9fac51ad8983353b
        Author: JaeHwa Jung <blrunner@apache.org>
        Date: 2014-11-19T06:46:47Z

        TAJO-1195: Remove unused CachedDNSResolver Class. (DaeMyung Kang via jaehwa)

        Closes #253

        commit 342d842d607e2085281932f0f962a3212ab56522
        Author: hys9958 <hanyounsu@gmail.com>
        Date: 2014-11-19T09:18:13Z

        TAJO-1199 EMR bootstrap script for Tajo

        commit 1c00584f569ef32e40fb399e4f5ab8226217adcd
        Author: hys9958 <hanyounsu@gmail.com>
        Date: 2014-11-19T09:42:34Z

        TAJO-1199 EMR bootstrap script for Tajo

        commit f1d93e48431a9d47b038f6839a194c0722c10a58
        Author: hys9958 <hanyounsu@gmail.com>
        Date: 2014-11-19T10:02:08Z

        TAJO-1199 EMR bootstrap script for Tajo

        commit 3d380150b6e2bba60cc17948b13fe09e224fcd8d
        Author: hys9958 <hanyounsu@gmail.com>
        Date: 2014-11-19T10:17:10Z

        TAJO-1199 EMR bootstrap script for Tajo


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user hys9958 opened a pull request: https://github.com/apache/tajo/pull/258 TAJO-1199 : EMR bootstrap script for Tajo I'll describe some examples soon. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hys9958/tajo tajo-1199 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/258.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #258 commit 67bdc71b17b54da731eaf01b9fac51ad8983353b Author: JaeHwa Jung <blrunner@apache.org> Date: 2014-11-19T06:46:47Z TAJO-1195 : Remove unused CachedDNSResolver Class. (DaeMyung Kang via jaehwa) Closes #253 commit 342d842d607e2085281932f0f962a3212ab56522 Author: hys9958 <hanyounsu@gmail.com> Date: 2014-11-19T09:18:13Z TAJO-1199 EMR bootstrap script for Tajo commit 1c00584f569ef32e40fb399e4f5ab8226217adcd Author: hys9958 <hanyounsu@gmail.com> Date: 2014-11-19T09:42:34Z TAJO-1199 EMR bootstrap script for Tajo commit f1d93e48431a9d47b038f6839a194c0722c10a58 Author: hys9958 <hanyounsu@gmail.com> Date: 2014-11-19T10:02:08Z TAJO-1199 EMR bootstrap script for Tajo commit 3d380150b6e2bba60cc17948b13fe09e224fcd8d Author: hys9958 <hanyounsu@gmail.com> Date: 2014-11-19T10:17:10Z TAJO-1199 EMR bootstrap script for Tajo
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hys9958 closed the pull request at:

        https://github.com/apache/tajo/pull/258

        Show
        githubbot ASF GitHub Bot added a comment - Github user hys9958 closed the pull request at: https://github.com/apache/tajo/pull/258
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user hys9958 opened a pull request:

        https://github.com/apache/tajo/pull/269

        TAJO-1199: EMR bootstrap script for Tajo

        Using aws-cli.

        • Install aws-cli like link. http://docs.aws.amazon.com/cli/latest/userguide/installing.html
        • Start Tajo with EMR Cluster like this,<br>
          $>aws emr create-cluster --name {cluster_name}

          --ami-version 3.3 --instance-type

          {instance_type}

          --instance-count

          {instance_count}

          --ec2-attributes KeyName=

          {key_pair_name}

          --bootstrap-action Path=s3://

          {your_bucket}/install-EMR-tajo.sh,Args=["-t","s3://{your_bucket}

          /tajo-0.9.0.tar.gz","-c","s3://

          {your_bucket}/conf","-l","s3://{your_bucket}

          /lib"]

        • bootstrap argument
        • '-t' is tajo binary Tarball URL.
        • '-c' is tajo conf directory URL.
        • '-l' is tajo third party lib URL.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/hys9958/tajo tajo-1199

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/269.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #269


        commit 80afe993b82d57582fbeab64d20199f4dfa3d9af
        Author: jhkim <jhkim@apache.org>
        Date: 2014-11-21T07:23:32Z

        TAJO-1205: Remove possible memory leak in TajoMaster. (jinho)

        Closes #265

        commit 32b521d63d5e95f4a0d4ef412346f0ce57f95e86
        Author: hys9958 <hanyounsu@gmail.com>
        Date: 2014-11-24T08:25:34Z

        TAJO-1199 EMR bootstrap script for Tajo


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user hys9958 opened a pull request: https://github.com/apache/tajo/pull/269 TAJO-1199 : EMR bootstrap script for Tajo Using aws-cli. Install aws-cli like link. http://docs.aws.amazon.com/cli/latest/userguide/installing.html Start Tajo with EMR Cluster like this,<br> $>aws emr create-cluster --name {cluster_name} --ami-version 3.3 --instance-type {instance_type} --instance-count {instance_count} --ec2-attributes KeyName= {key_pair_name} --bootstrap-action Path=s3:// {your_bucket}/install-EMR-tajo.sh,Args=["-t","s3://{your_bucket} /tajo-0.9.0.tar.gz","-c","s3:// {your_bucket}/conf","-l","s3://{your_bucket} /lib"] bootstrap argument '-t' is tajo binary Tarball URL. '-c' is tajo conf directory URL. '-l' is tajo third party lib URL. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hys9958/tajo tajo-1199 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/269.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #269 commit 80afe993b82d57582fbeab64d20199f4dfa3d9af Author: jhkim <jhkim@apache.org> Date: 2014-11-21T07:23:32Z TAJO-1205 : Remove possible memory leak in TajoMaster. (jinho) Closes #265 commit 32b521d63d5e95f4a0d4ef412346f0ce57f95e86 Author: hys9958 <hanyounsu@gmail.com> Date: 2014-11-24T08:25:34Z TAJO-1199 EMR bootstrap script for Tajo
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/269#issuecomment-64321768

        The patch looks good to me. I'd like to suggest two things.

        • One is long option which is more descriptive. For example, it would be great if the script provides ```-tar``` (```-t```), ```conf``` (```-c```), and ```-lib``` (```-l```) .
        • The second is more modularization. Currently, multiple steps are a sequence of routines. Could you modularize them into multiple pieces of functions?
        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/269#issuecomment-64321768 The patch looks good to me. I'd like to suggest two things. One is long option which is more descriptive. For example, it would be great if the script provides ```- tar``` (```-t```), ``` conf``` (```-c```), and ``` -lib``` (```-l```) . The second is more modularization. Currently, multiple steps are a sequence of routines. Could you modularize them into multiple pieces of functions?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hys9958 closed the pull request at:

        https://github.com/apache/tajo/pull/269

        Show
        githubbot ASF GitHub Bot added a comment - Github user hys9958 closed the pull request at: https://github.com/apache/tajo/pull/269
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user hys9958 opened a pull request:

        https://github.com/apache/tajo/pull/275

        TAJO-1199: EMR bootstrap script for Tajo

        Bootstrap Action Arguments:
        ==========================

        Usage: install-tajo.sh [OPTIONS]

        -t [S3_PATH_TO_TAJO_BIN_TARBALL]
        Ex: s3://[your_bucket]/[your_path]/tajo-

        {version}

        .tar.gz
        Default: http://d3kp3z3ppbkcio.cloudfront.net/tajo-0.9.0/tajo-0.9.0.tar.gz
        -c [S3_PATH_TO_TAJO_CONF_DIR]
        Ex: s3://[your_bucket]/[your_path]/conf
        -l [S3_PATH_TO_THIRD_PARTY_JARS_DIR]
        Ex: s3://[your_bucket]/[your_path]/lib
        -h
        Display help message
        -T [LOCAL_PATH_TO_TEST_ROOT] (only used for local test)
        Ex: /[LOCAL_PATH_TO_TEST_ROOT]
        -H [LOCAL_PATH_TO_HADOOP_HOME_FOR_TEST] (only used for local test)
        Ex: /[LOCAL_PATH_TO_HADOOP_HOME_FOR_TEST]

        Note that all arguments are optional. ``-T`` and ``-H`` are only for local test.

        Sample Commands:
        ================

        Launching a Tajo cluster with a default configurations
        -------------------------------------------------------

        • It uses EMR HDFS as ```tajo.root``` which includes the warehouse directory
        • It uses all default heap and concurrency configs.
        • It is good for a simple test.

        ```
        $ aws emr create-cluster \
        --name="[CLUSTER_NAME]" \
        --ami-version=3.3 \
        --ec2-attributes KeyName=[KEY_FIAR_NAME] \
        --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=1,InstanceType=c3.xlarge \
        --bootstrap-action Name="Install tajo",Path=s3://[your_bucket]/[your_path]/install-tajo.sh
        ```

        Launching a Tajo cluster with additional configurations
        -------------------------------------------------------

        • To use your Tajo tarball, you should use ```-t``` to specify S3 URL.
        • To change ```tajo.rootdir```, you should make your own ```tajo-site.xml``` and use ```-c``` option to specify S3 URL for config dirs.
        • You can find appropriate config templates in tajo-emr/template.
        • To use RDS, you needs appropriate JDBC jars like mysql-connector.jar. ```-l``` option allows you to specify S3 directory URL, including third party Jars.

        ```
        aws emr create-cluster \
        --name="[CLUSTER_NAME]" \
        --ami-version=3.3 \
        --ec2-attributes KeyName=[KEY_FIAR_NAME] \
        --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=1,InstanceType=c3.xlarge \
        --bootstrap-action Name="Install tajo",Path=s3://[your_bucket]/[your_path]/install-tajo.sh,Args=["-t","s3://[your_bucket]/tajo-0.9.0.tar.gz","-c","s3://[your_bucket]/conf","-l","s3://[your_bucket]/lib"]
        ```

        How to test bootstrap in local machine
        =======================================
        ```install-tajo.sh``` allows users to test the bootstrap in local machine without EMR instances. For it, you need to use ```-T``` and ```-H``` options.

        • ```-T``` - Testing root dir which is temporarily used for testing.
        • ```-H``` - Hadoop binary directory which is used to pretended to be EMR Hadoop home

        ```
        $ ./install-EMR-tajo.sh -t /[your_local_binary_path]/tajo-0.9.0.tar.gz -c /[your_test_conf_dir]/conf -l /[your_test_lib_dir]/lib -T /[LOCAL_PATH_TO_TEST_ROOT] -H /[LOCAL_PATH_TO_HADOOP_HOME_FOR_TEST]
        ```

        Running with AWS RDS
        ====================
        Tajo can use RDS. For it, you need to make sure you already have a running RDS instance. Then, you need to make your ```catalog-site.xml```. Please refer to [Catalog configuration documentation] (http://tajo.apache.org/docs/current/configuration/catalog_configuration.html) in Tajo doc.

        Also, you should use ```-c``` option in order to use your custom ```catalog-site.xml``` file.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/hys9958/tajo tajo-1199

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/275.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #275


        commit 0b4b135c81ca3548e78d622c26027808883b9c9f
        Author: hys9958 <hanyounsu@gmail.com>
        Date: 2014-12-01T07:06:43Z

        TAJO-1199: EMR bootstrap script for Tajo


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user hys9958 opened a pull request: https://github.com/apache/tajo/pull/275 TAJO-1199 : EMR bootstrap script for Tajo Bootstrap Action Arguments: ========================== Usage: install-tajo.sh [OPTIONS] -t [S3_PATH_TO_TAJO_BIN_TARBALL] Ex: s3:// [your_bucket] / [your_path] /tajo- {version} .tar.gz Default: http://d3kp3z3ppbkcio.cloudfront.net/tajo-0.9.0/tajo-0.9.0.tar.gz -c [S3_PATH_TO_TAJO_CONF_DIR] Ex: s3:// [your_bucket] / [your_path] /conf -l [S3_PATH_TO_THIRD_PARTY_JARS_DIR] Ex: s3:// [your_bucket] / [your_path] /lib -h Display help message -T [LOCAL_PATH_TO_TEST_ROOT] (only used for local test) Ex: / [LOCAL_PATH_TO_TEST_ROOT] -H [LOCAL_PATH_TO_HADOOP_HOME_FOR_TEST] (only used for local test) Ex: / [LOCAL_PATH_TO_HADOOP_HOME_FOR_TEST] Note that all arguments are optional. ``-T`` and ``-H`` are only for local test. Sample Commands: ================ Launching a Tajo cluster with a default configurations ------------------------------------------------------- It uses EMR HDFS as ```tajo.root``` which includes the warehouse directory It uses all default heap and concurrency configs. It is good for a simple test. ``` $ aws emr create-cluster \ --name=" [CLUSTER_NAME] " \ --ami-version=3.3 \ --ec2-attributes KeyName= [KEY_FIAR_NAME] \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=1,InstanceType=c3.xlarge \ --bootstrap-action Name="Install tajo",Path=s3:// [your_bucket] / [your_path] /install-tajo.sh ``` Launching a Tajo cluster with additional configurations ------------------------------------------------------- To use your Tajo tarball, you should use ```-t``` to specify S3 URL. To change ```tajo.rootdir```, you should make your own ```tajo-site.xml``` and use ```-c``` option to specify S3 URL for config dirs. You can find appropriate config templates in tajo-emr/template. To use RDS, you needs appropriate JDBC jars like mysql-connector.jar. ```-l``` option allows you to specify S3 directory URL, including third party Jars. ``` aws emr create-cluster \ --name=" [CLUSTER_NAME] " \ --ami-version=3.3 \ --ec2-attributes KeyName= [KEY_FIAR_NAME] \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=1,InstanceType=c3.xlarge \ --bootstrap-action Name="Install tajo",Path=s3:// [your_bucket] / [your_path] /install-tajo.sh,Args=["-t","s3:// [your_bucket] /tajo-0.9.0.tar.gz","-c","s3:// [your_bucket] /conf","-l","s3:// [your_bucket] /lib"] ``` How to test bootstrap in local machine ======================================= ```install-tajo.sh``` allows users to test the bootstrap in local machine without EMR instances. For it, you need to use ```-T``` and ```-H``` options. ```-T``` - Testing root dir which is temporarily used for testing. ```-H``` - Hadoop binary directory which is used to pretended to be EMR Hadoop home ``` $ ./install-EMR-tajo.sh -t / [your_local_binary_path] /tajo-0.9.0.tar.gz -c / [your_test_conf_dir] /conf -l / [your_test_lib_dir] /lib -T / [LOCAL_PATH_TO_TEST_ROOT] -H / [LOCAL_PATH_TO_HADOOP_HOME_FOR_TEST] ``` Running with AWS RDS ==================== Tajo can use RDS. For it, you need to make sure you already have a running RDS instance. Then, you need to make your ```catalog-site.xml```. Please refer to [Catalog configuration documentation] ( http://tajo.apache.org/docs/current/configuration/catalog_configuration.html ) in Tajo doc. Also, you should use ```-c``` option in order to use your custom ```catalog-site.xml``` file. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hys9958/tajo tajo-1199 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/275.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #275 commit 0b4b135c81ca3548e78d622c26027808883b9c9f Author: hys9958 <hanyounsu@gmail.com> Date: 2014-12-01T07:06:43Z TAJO-1199 : EMR bootstrap script for Tajo
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/275#issuecomment-66257171

        Hi @hys9958,

        It's a really great job. I tested it on EMR several times. It works well, and it was very convenient. I'm very happy to see this work.

        It would be good if this script is merged to Tajo repository. BTW, it would be the best if it is submitted to github repository directly managed by Amazon (https://github.com/awslabs/emr-bootstrap-actions). If so, Amazon will provide the script and Tajo release tarball on their S3.

        If you are Ok, I can help you submit the patch to Amazon's repo.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/275#issuecomment-66257171 Hi @hys9958, It's a really great job. I tested it on EMR several times. It works well, and it was very convenient. I'm very happy to see this work. It would be good if this script is merged to Tajo repository. BTW, it would be the best if it is submitted to github repository directly managed by Amazon ( https://github.com/awslabs/emr-bootstrap-actions ). If so, Amazon will provide the script and Tajo release tarball on their S3. If you are Ok, I can help you submit the patch to Amazon's repo.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hys9958 commented on the pull request:

        https://github.com/apache/tajo/pull/275#issuecomment-66416294

        Ok~
        Thanks!

        Show
        githubbot ASF GitHub Bot added a comment - Github user hys9958 commented on the pull request: https://github.com/apache/tajo/pull/275#issuecomment-66416294 Ok~ Thanks!
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hys9958 closed the pull request at:

        https://github.com/apache/tajo/pull/275

        Show
        githubbot ASF GitHub Bot added a comment - Github user hys9958 closed the pull request at: https://github.com/apache/tajo/pull/275

          People

          • Assignee:
            hys9958 YeonSu Han
            Reporter:
            hys9958 YeonSu Han
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development