Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 1.0.0
    • Component/s: general
    • Labels:

      Description

      Please accept the patch of adding GridGain to the BigTop product. Addition of GridGain was agreed upon in this thread:
      http://mail-archives.apache.org/mod_mbox/bigtop-dev/201403.mbox/%3CCA+0=VoXBik-=go=vmVb3PWbdv08dGBpEm_UD7NDiRr4hguxFPA@mail.gmail.com%3E

      1. latest-201410111925.patch
        45 kB
        Dmitriy Setrakyan
      2. 0001-BIGTOP-1490.-Adding-GridGain-to-BigTop.patch
        47 kB
        Ilya Tikhonov
      3. 0001-BIGTOP-1490.-Adding-GridGain-to-BigTop.patch
        48 kB
        Konstantin Boudnik
      4. 0001-BIGTOP-1490.-Adding-GridGain-to-BigTop.patch
        63 kB
        Ilya Tikhonov
      5. 0001-BIGTOP-1490.-Adding-GridGain-to-BigTop.patch
        61 kB
        Ilya Tikhonov
      6. 0001-BIGTOP-1490.-Adding-GridGain-to-BigTop.patch
        61 kB
        Ilya Tikhonov

        Issue Links

          Activity

          Hide
          dsetrakyan Dmitriy Setrakyan added a comment -

          The patch is attached.

          Show
          dsetrakyan Dmitriy Setrakyan added a comment - The patch is attached.
          Hide
          jayunit100 jay vyas added a comment - - edited

          Thanks dmitry. this is pretty exciting !

          • How do we test that grid gain is working ? IS there a grid gain test we can run, or will it just speed up any of our existing hadoop workloads once we enable it via puppet?
          Show
          jayunit100 jay vyas added a comment - - edited Thanks dmitry. this is pretty exciting ! A lot of people submit patches using git format-patch , so that author metadata is correct. Especially for a big patch like this you might want to consider that ? https://cwiki.apache.org/confluence/display/BIGTOP/How+to+Contribute (or else you can just let me know the author user name) How do we test that grid gain is working ? IS there a grid gain test we can run, or will it just speed up any of our existing hadoop workloads once we enable it via puppet?
          Hide
          cos Konstantin Boudnik added a comment - - edited

          Dmitriy Setrakyan could you please chime on the questions above?

          Show
          cos Konstantin Boudnik added a comment - - edited Dmitriy Setrakyan could you please chime on the questions above?
          Hide
          dsetrakyan Dmitriy Setrakyan added a comment -

          Jay, thanks for the comments.

          1. We will update the patch and resubmit shortly, so you will have the author metadata.
          2. We are also adding a test there, so you can confirm that GridGain is working.

          We should be done with the above by the end of the week.

          Show
          dsetrakyan Dmitriy Setrakyan added a comment - Jay, thanks for the comments. We will update the patch and resubmit shortly, so you will have the author metadata. We are also adding a test there, so you can confirm that GridGain is working. We should be done with the above by the end of the week.
          Hide
          rvs Roman Shaposhnik added a comment -

          Typically for every new component that is added to BigTop we create at least the 3 following subtasks:

          • packaging code
          • deployment code
          • tests

          Any chance you guys can create those and tackle the changes?

          Show
          rvs Roman Shaposhnik added a comment - Typically for every new component that is added to BigTop we create at least the 3 following subtasks: packaging code deployment code tests Any chance you guys can create those and tackle the changes?
          Hide
          cos Konstantin Boudnik added a comment -

          @rvs, from what I see in the current patch it has

          • packaging code
          • simple deployment code
          • package tests
            Do we really need to formally split the patch in three different subtasks?

          As far as I understand, GG provides a caching layer on top of HDFS/MR and that can be tested with the existing hadoop tests if the configuration is tweaked a little bit. So, perhaps, no new tests are even needed.

          Show
          cos Konstantin Boudnik added a comment - @rvs, from what I see in the current patch it has packaging code simple deployment code package tests Do we really need to formally split the patch in three different subtasks? As far as I understand, GG provides a caching layer on top of HDFS/MR and that can be tested with the existing hadoop tests if the configuration is tweaked a little bit. So, perhaps, no new tests are even needed.
          Hide
          pelya Ilya Tikhonov added a comment -

          There is new patch in git format with some fixes

          Show
          pelya Ilya Tikhonov added a comment - There is new patch in git format with some fixes
          Hide
          cos Konstantin Boudnik added a comment -

          To catch reviewers attention

          Show
          cos Konstantin Boudnik added a comment - To catch reviewers attention
          Hide
          rvs Roman Shaposhnik added a comment -

          In general it looks good. A couple of points, before committing:

          1. lets either remove man page or make it meaningful (at least not having blah-blah and may be pointing the reader to a URL)
          2. at this point there's one service and yet I see DAEMON=".GRIDGAIN_DAEMON." substitutions going on. Do we expect other services to be added eventually?
          3. it seems there are 3 packages added, but only one added to the package test manifest

          And just for my own education: once I have a bunch of gridgain-hadoop services up and running on my cluster, is there any kind web-based URL I can go to to see what's going on with the guts of GridGain cluster?

          And one last question, since Ignite is now in the Incubator, what's the timeline of switching to that?

          Show
          rvs Roman Shaposhnik added a comment - In general it looks good. A couple of points, before committing: lets either remove man page or make it meaningful (at least not having blah-blah and may be pointing the reader to a URL) at this point there's one service and yet I see DAEMON=".GRIDGAIN_DAEMON." substitutions going on. Do we expect other services to be added eventually? it seems there are 3 packages added, but only one added to the package test manifest And just for my own education: once I have a bunch of gridgain-hadoop services up and running on my cluster, is there any kind web-based URL I can go to to see what's going on with the guts of GridGain cluster? And one last question, since Ignite is now in the Incubator, what's the timeline of switching to that?
          Hide
          cos Konstantin Boudnik added a comment - - edited

          Thanks for the review Roman. My take

          1. let's use the latter approach, we don't need to do full man pages, but at least something will be nice - like URL, as you said. The patch already has some, but let's add more....
          2. I believe it will be the case when more bits of the gridgain stack (or Apache Ignite (incubating) are added. Right now only hadoop accelerator is there, but I envision more components and services coming along
          3. good catch!

          My own review comment would be to improve Puppet deployment, but I'd suggest to make it happen in a separate JIRA to avoid withholding the commit for too long.

          Show
          cos Konstantin Boudnik added a comment - - edited Thanks for the review Roman. My take let's use the latter approach, we don't need to do full man pages, but at least something will be nice - like URL, as you said. The patch already has some, but let's add more.... I believe it will be the case when more bits of the gridgain stack (or Apache Ignite (incubating) are added. Right now only hadoop accelerator is there, but I envision more components and services coming along good catch! My own review comment would be to improve Puppet deployment, but I'd suggest to make it happen in a separate JIRA to avoid withholding the commit for too long.
          Hide
          cos Konstantin Boudnik added a comment -

          Added two more package tests.

          Show
          cos Konstantin Boudnik added a comment - Added two more package tests.
          Hide
          cos Konstantin Boudnik added a comment -

          Ilya Tikhonov, could you address the man pages comment adding more references to the online docs?

          Show
          cos Konstantin Boudnik added a comment - Ilya Tikhonov , could you address the man pages comment adding more references to the online docs?
          Hide
          jayunit100 jay vyas added a comment - - edited

          overall +1. here are my notes ........ can someone else also confirm that all the main components are in the patch by eyeballing it?

          • patch looks pretty clean, a few trailing whitespace but we can fix that when we apply --fix-whitespace
          • looks like it builds okay using gradle gridgain-hadoop-rpm,
          • also confirmed that the rpm package seems to build properly

          is this just the hadoop part of gridgain ? Or the pure compute framework as well ( i thought gridgain had an alternative to mapreduce as well).

          This is quite a sophisticated and large patch. would love to see it accompanied with a mailing list announcement or wiki page update.

          Requires: /bin/bash
          Processing files: gridgain-hadoop-doc-6.5.2-1.fc20.noarch
          Provides: gridgain-hadoop-doc = 6.5.2-1.fc20
          Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1
          Checking for unpackaged file(s): /usr/lib/rpm/check-files /home/apache/Development/bigtop-jayunit100/build/gridgain-hadoop/rpm/BUILDROOT/gridgain-hadoop-6.5.2-1.fc20.x86_64
          Wrote: /home/apache/Development/bigtop-jayunit100/build/gridgain-hadoop/rpm/RPMS/noarch/gridgain-hadoop-6.5.2-1.fc20.noarch.rpm
          Wrote: /home/apache/Development/bigtop-jayunit100/build/gridgain-hadoop/rpm/RPMS/noarch/gridgain-hadoop-service-6.5.2-1.fc20.noarch.rpm
          Wrote: /home/apache/Development/bigtop-jayunit100/build/gridgain-hadoop/rpm/RPMS/noarch/gridgain-hadoop-doc-6.5.2-1.fc20.noarch.rpm
          Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.Jas5xj
          + umask 022
          + cd /home/apache/Development/bigtop-jayunit100/build/gridgain-hadoop/rpm//BUILD
          + cd gridgain-release-6.5.2
          + /usr/bin/rm -rf /home/apache/Development/bigtop-jayunit100/build/gridgain-hadoop/rpm/BUILDROOT/gridgain-hadoop-6.5.2-1.fc20.x86_64
          + exit 0
          Executing(--clean): /bin/sh -e /var/tmp/rpm-tmp.e30TaV
          + umask 022
          + cd /home/apache/Development/bigtop-jayunit100/build/gridgain-hadoop/rpm//BUILD
          + rm -rf gridgain-release-6.5.2
          + exit 0
          
          BUILD SUCCESSFUL
          
          Total time: 6 mins 40.441 secs
          

          So,,, +1 from me ! The patch is functional and creates a rpm package.

          Can one other bigtop commiter to sign off on it at a high level before we commit ?

          Show
          jayunit100 jay vyas added a comment - - edited overall +1. here are my notes ........ can someone else also confirm that all the main components are in the patch by eyeballing it? patch looks pretty clean, a few trailing whitespace but we can fix that when we apply --fix-whitespace looks like it builds okay using gradle gridgain-hadoop-rpm , also confirmed that the rpm package seems to build properly is this just the hadoop part of gridgain ? Or the pure compute framework as well ( i thought gridgain had an alternative to mapreduce as well). This is quite a sophisticated and large patch. would love to see it accompanied with a mailing list announcement or wiki page update. Requires: /bin/bash Processing files: gridgain-hadoop-doc-6.5.2-1.fc20.noarch Provides: gridgain-hadoop-doc = 6.5.2-1.fc20 Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 Checking for unpackaged file(s): /usr/lib/rpm/check-files /home/apache/Development/bigtop-jayunit100/build/gridgain-hadoop/rpm/BUILDROOT/gridgain-hadoop-6.5.2-1.fc20.x86_64 Wrote: /home/apache/Development/bigtop-jayunit100/build/gridgain-hadoop/rpm/RPMS/noarch/gridgain-hadoop-6.5.2-1.fc20.noarch.rpm Wrote: /home/apache/Development/bigtop-jayunit100/build/gridgain-hadoop/rpm/RPMS/noarch/gridgain-hadoop-service-6.5.2-1.fc20.noarch.rpm Wrote: /home/apache/Development/bigtop-jayunit100/build/gridgain-hadoop/rpm/RPMS/noarch/gridgain-hadoop-doc-6.5.2-1.fc20.noarch.rpm Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.Jas5xj + umask 022 + cd /home/apache/Development/bigtop-jayunit100/build/gridgain-hadoop/rpm//BUILD + cd gridgain-release-6.5.2 + /usr/bin/rm -rf /home/apache/Development/bigtop-jayunit100/build/gridgain-hadoop/rpm/BUILDROOT/gridgain-hadoop-6.5.2-1.fc20.x86_64 + exit 0 Executing(--clean): /bin/sh -e /var/tmp/rpm-tmp.e30TaV + umask 022 + cd /home/apache/Development/bigtop-jayunit100/build/gridgain-hadoop/rpm//BUILD + rm -rf gridgain-release-6.5.2 + exit 0 BUILD SUCCESSFUL Total time: 6 mins 40.441 secs So,,, +1 from me ! The patch is functional and creates a rpm package. Can one other bigtop commiter to sign off on it at a high level before we commit ?
          Hide
          cos Konstantin Boudnik added a comment -

          Thanks for the review, Jay! Let's wait until we have man page comment addressed and then commit it. Looks like Roman is ok with the overall patch. It also looks good to me (modulo Puppet stuff, that I opened a separate ticket for)

          Show
          cos Konstantin Boudnik added a comment - Thanks for the review, Jay! Let's wait until we have man page comment addressed and then commit it. Looks like Roman is ok with the overall patch. It also looks good to me (modulo Puppet stuff, that I opened a separate ticket for)
          Hide
          jayunit100 jay vyas added a comment - - edited

          okay great. and Ilya Tikhonov, you can easily add a smoke test to bigtop-tests/smoke-tests by dumping a groovy file in there that does something simple (i.e. make a simple gridgain API call or something). just let me know if you need help with that. (OR)

          • please create a separate JIRA to add smoke tests if you want to do it after the fact... but lets definetly add a grid gain smoke test....
          • also clarify Roman Shaposhnik's question : whats the plan regarding ignite? Seems like an interesting project.
          Show
          jayunit100 jay vyas added a comment - - edited okay great. and Ilya Tikhonov , you can easily add a smoke test to bigtop-tests/smoke-tests by dumping a groovy file in there that does something simple (i.e. make a simple gridgain API call or something). just let me know if you need help with that. (OR) please create a separate JIRA to add smoke tests if you want to do it after the fact... but lets definetly add a grid gain smoke test.... also clarify Roman Shaposhnik 's question : whats the plan regarding ignite? Seems like an interesting project.
          Hide
          cos Konstantin Boudnik added a comment -

          Ignite is essentially the GridGain platform under ASF umbrella. It has been accepted to the incubation about a month ago and we are going through initial chores of creating accounts, JIRA, doing IP clearance, etc. Realistically, I hope will have a first release in about a month+. I will let the core developers of the project to comment on it, though...

          Show
          cos Konstantin Boudnik added a comment - Ignite is essentially the GridGain platform under ASF umbrella. It has been accepted to the incubation about a month ago and we are going through initial chores of creating accounts, JIRA, doing IP clearance, etc. Realistically, I hope will have a first release in about a month+. I will let the core developers of the project to comment on it, though...
          Hide
          pelya Ilya Tikhonov added a comment -

          I added smoke test. It runs standard wordcount example via gridgain-hadoop engine.

          Show
          pelya Ilya Tikhonov added a comment - I added smoke test. It runs standard wordcount example via gridgain-hadoop engine.
          Hide
          cos Konstantin Boudnik added a comment -

          Thanks Ilya Tikhonov! A couple of small comments on the test:

          • you import Log facility yet keep using println for the output. Could you please make the use of Log instead?
          • I'd suggest to change input and output patch to something unique like gh-input and gh-output. Also, do not use absolute path / because it is only writable by user hdfs and you aren't guarantee to run the tests under the account.
          • I'd suggest to move clean up phase to @Before method instead of calling it from the test itself.
          • I see two log4j.properties files in conf/ and in the test directory. Do you think it's feasible to merge them together?
          • do you think it is possible to keep the configuration files under conf/? Or they are better be in the test directory?

          Last comment: the patch applies with a few whitespace warnings. Could you take a stab at fixing them? Thanks!

          Show
          cos Konstantin Boudnik added a comment - Thanks Ilya Tikhonov ! A couple of small comments on the test: you import Log facility yet keep using println for the output. Could you please make the use of Log instead? I'd suggest to change input and output patch to something unique like gh-input and gh-output. Also, do not use absolute path / because it is only writable by user hdfs and you aren't guarantee to run the tests under the account. I'd suggest to move clean up phase to @Before method instead of calling it from the test itself. I see two log4j.properties files in conf/ and in the test directory. Do you think it's feasible to merge them together? do you think it is possible to keep the configuration files under conf/ ? Or they are better be in the test directory? Last comment: the patch applies with a few whitespace warnings. Could you take a stab at fixing them? Thanks!
          Hide
          pelya Ilya Tikhonov added a comment -

          Thanks Konstantin Boudnik! I fixed everything you asked except file writing in root directory.
          It's not a problem. This test writes files into GGFS instance. GGFS doesn't support permissions in current version. I sow that HDFS has been initialized with necessary directory tree but it is not done for GGFS. I would not like to complicate this right now.

          Show
          pelya Ilya Tikhonov added a comment - Thanks Konstantin Boudnik ! I fixed everything you asked except file writing in root directory. It's not a problem. This test writes files into GGFS instance. GGFS doesn't support permissions in current version. I sow that HDFS has been initialized with necessary directory tree but it is not done for GGFS. I would not like to complicate this right now.
          Hide
          jayunit100 jay vyas added a comment - - edited

          Thanks for the smoke test !

          question: if i ran yum remove gridgain and ran this test wouldnt this test still pass ? doesnt seem to be doing anything other than hadoop fs -put?

          Maybe im missing something?

          iit would be awesome if there might be a way that you can confirm that the small amount of data has been written to a gridgain specific in memory cache somehwere by making a call ?

          Show
          jayunit100 jay vyas added a comment - - edited Thanks for the smoke test ! question: if i ran yum remove gridgain and ran this test wouldnt this test still pass ? doesnt seem to be doing anything other than hadoop fs -put? Maybe im missing something? iit would be awesome if there might be a way that you can confirm that the small amount of data has been written to a gridgain specific in memory cache somehwere by making a call ?
          Hide
          pelya Ilya Tikhonov added a comment - - edited

          Hmmm. yum remove gridgain doesn't remove gridgain packages. I ran yum remove gridgain-hadoop and retested this.

          As a result, service process was killed, and command

          gradle compileGroovy clean test -Dsmoke.tests=gridgain-hadoop --info
          

          returned following:

          org.apache.bigtop.itest.hadoop.mapreduce.TestGridGainHadoop > test FAILED
              org.junit.ComparisonFailure: Incorrect output expected:<[black	5
              blue	11
              green	11
              white	5
              yellow	11]> but was:<[]>
                  at org.junit.Assert.assertEquals(Assert.java:115)
                  at org.junit.Assert$assertEquals.callStatic(Unknown Source)
                  at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallStatic(CallSiteArray.java:50)
                  at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callStatic(AbstractCallSite.java:157)
                  at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callStatic(AbstractCallSite.java:173)
                  at org.apache.bigtop.itest.hadoop.mapreduce.TestGridGainHadoop.test(TestGridGainHadoop.groovy:70)
          


          About the test, it performs the following:

          Via configs are placed in conf directory, it plugs two GridGain components - "MR processor" and "Memory file system" (GGFS).

          Next, it runs the following commands:

          hadoop fs -mkdir /gh-input
          hadoop fs -put test.data /gh-input/
          hadoop jar $HADOOP_MAPRED_HOME/hadoop-mapreduce-examples.jar wordcount /gh-input /gh-output
          hadoop fs -cat /gh-output/part-r-00000
          

          All these operations don't affect real Hadoop nodes in this mode.

          Next, the test script gets output of "cat" command and compares it with hard-coded string

          blue	11
          green	11
          white	5
          yellow	11
          


          About the call of cache specified method, I must consult with colleagues.

          If you not cope with this test could you please provide more info how are you running it?

          Show
          pelya Ilya Tikhonov added a comment - - edited Hmmm. yum remove gridgain doesn't remove gridgain packages. I ran yum remove gridgain-hadoop and retested this. As a result, service process was killed, and command gradle compileGroovy clean test -Dsmoke.tests=gridgain-hadoop --info returned following: org.apache.bigtop.itest.hadoop.mapreduce.TestGridGainHadoop > test FAILED org.junit.ComparisonFailure: Incorrect output expected:<[black 5 blue 11 green 11 white 5 yellow 11]> but was:<[]> at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert$assertEquals.callStatic(Unknown Source) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallStatic(CallSiteArray.java:50) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callStatic(AbstractCallSite.java:157) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callStatic(AbstractCallSite.java:173) at org.apache.bigtop.itest.hadoop.mapreduce.TestGridGainHadoop.test(TestGridGainHadoop.groovy:70) About the test, it performs the following: Via configs are placed in conf directory, it plugs two GridGain components - "MR processor" and "Memory file system" (GGFS). Next, it runs the following commands: hadoop fs -mkdir /gh-input hadoop fs -put test.data /gh-input/ hadoop jar $HADOOP_MAPRED_HOME/hadoop-mapreduce-examples.jar wordcount /gh-input /gh-output hadoop fs -cat /gh-output/part-r-00000 All these operations don't affect real Hadoop nodes in this mode. Next, the test script gets output of "cat" command and compares it with hard-coded string blue 11 green 11 white 5 yellow 11 About the call of cache specified method, I must consult with colleagues. If you not cope with this test could you please provide more info how are you running it?
          Hide
          cos Konstantin Boudnik added a comment -

          Looks good. One last thing: does it should be in package org.apache.bigtop.itest.hadoop.mapreduce package? Why not put it in package org.apache.bigtop.itest.hadoop.gridgain-hadoop ?

          Show
          cos Konstantin Boudnik added a comment - Looks good. One last thing: does it should be in package org.apache.bigtop.itest.hadoop.mapreduce package? Why not put it in package org.apache.bigtop.itest.hadoop.gridgain-hadoop ?
          Hide
          pelya Ilya Tikhonov added a comment - - edited

          I renamed package to org.apache.bigtop.itest.hadoop.gridgain. I think it doesn't need to write hadoop two times and minus symbol is not allowable.

          About calling cache-specific methods.

          It would be nice we read the result file from GGFS via calling native GridGain API. But now we can do it only from the same process that is running the GridGain work node. The native clean client mode for GGFS is in development.
          I'm offering to fix it in future version.

          Show
          pelya Ilya Tikhonov added a comment - - edited I renamed package to org.apache.bigtop.itest.hadoop.gridgain . I think it doesn't need to write hadoop two times and minus symbol is not allowable. About calling cache-specific methods. It would be nice we read the result file from GGFS via calling native GridGain API. But now we can do it only from the same process that is running the GridGain work node. The native clean client mode for GGFS is in development. I'm offering to fix it in future version.
          Hide
          cos Konstantin Boudnik added a comment -

          +1 the patch looks good. I will commit it by the end of the day if I don't hear any other comments.

          Ilya Tikhonov, could you please update BIGTOP-1502 with the info on how you configure GG layer after the installation so you can run the tests? It will help me quite a bit to improve the Puppet recipe, so the next time the installation and configuration will be done automatically Thanks!

          Show
          cos Konstantin Boudnik added a comment - +1 the patch looks good. I will commit it by the end of the day if I don't hear any other comments. Ilya Tikhonov , could you please update BIGTOP-1502 with the info on how you configure GG layer after the installation so you can run the tests? It will help me quite a bit to improve the Puppet recipe, so the next time the installation and configuration will be done automatically Thanks!
          Hide
          cos Konstantin Boudnik added a comment -

          Pushed to
          8766429..70c8dec HEAD -> master

          Thanks Ilya!

          Show
          cos Konstantin Boudnik added a comment - Pushed to 8766429..70c8dec HEAD -> master Thanks Ilya!

            People

            • Assignee:
              pelya Ilya Tikhonov
              Reporter:
              dsetrakyan Dmitriy Setrakyan
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development