Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-3871

run integration tests as a map/reduce job

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.8.0
    • Component/s: test
    • Labels:
      None

      Description

      When the functional tests were moved to java, we lost the ability to run the tests via map/reduce. It would be nice to run the ITs in under 2 hours. and take advantage of an entire cluster, especially after making large sweeping changes.

        Issue Links

          Activity

          Hide
          ecn Eric Newton added a comment -

          I'm running into some trickiness finishing this work.

          Several tests have some test dependencies. I've been adding these to my -libjars list, but some of these tests have a lot of transitive dependencies (I'm looking at you, Kerberos).

          So, the suggested way of fixing this would be to move the test/src/test/java code into it's own project, and move the classes from test/src/test/java to project/src/main/java.

          The fact that these two jars share the org.apache.accumulo.test namespace has already been an issue, so I had to stop Sealing the test-jar.

          So, I would like to move these tests to a new project (it-test? test-it? cluster-test?), and maybe shade in the dependencies so invoking the map-reduce job is simpler. I've never used shading before, so I'm not sure if this would be prohibitive. Of course, the test jar would not be in the accumulo-bin.tar.gz file unless we use a specific profile.

          Alternatively, there might be some way to use maven to discover the recursive test dependencies and pull them down at runtime.

          Any thoughts?

          Show
          ecn Eric Newton added a comment - I'm running into some trickiness finishing this work. Several tests have some test dependencies. I've been adding these to my -libjars list, but some of these tests have a lot of transitive dependencies (I'm looking at you, Kerberos). So, the suggested way of fixing this would be to move the test/src/test/java code into it's own project, and move the classes from test/src/test/java to project/src/main/java. The fact that these two jars share the org.apache.accumulo.test namespace has already been an issue, so I had to stop Sealing the test-jar. So, I would like to move these tests to a new project (it-test? test-it? cluster-test?), and maybe shade in the dependencies so invoking the map-reduce job is simpler. I've never used shading before, so I'm not sure if this would be prohibitive. Of course, the test jar would not be in the accumulo-bin.tar.gz file unless we use a specific profile. Alternatively, there might be some way to use maven to discover the recursive test dependencies and pull them down at runtime. Any thoughts?
          Hide
          elserj Josh Elser added a comment -

          some of these tests have a lot of transitive dependencies (I'm looking at you, Kerberos).

          :smile:. Yeah, I know that Apache Directory pulls in a ton of deps just to get the MiniKdc functionality. It would be nice if we could work with them to pare that down in the future.

          So, the suggested way of fixing this would be to move the test/src/test/java code into it's own project, and move the classes from test/src/test/java to project/src/main/java.

          The fact that these two jars share the org.apache.accumulo.test namespace has already been an issue, so I had to stop Sealing the test-jar.

          So, I would like to move these tests to a new project (it-test? test-it? cluster-test?), and maybe shade in the dependencies so invoking the map-reduce job is simpler. I've never used shading before, so I'm not sure if this would be prohibitive. Of course, the test jar would not be in the accumulo-bin.tar.gz file unless we use a specific profile.

          At first read, it seems like if we could make a mapreduce-it module (or something) that just shades in the dependencies and test classes you need. Maybe you could keep the existing accumulo-test jar off of the classpath and that would prevent you from having to unseal the existing accumulo-test jar. Not entirely sure... this is the time I usually would ask Christopher Tubbs

          Show
          elserj Josh Elser added a comment - some of these tests have a lot of transitive dependencies (I'm looking at you, Kerberos). :smile:. Yeah, I know that Apache Directory pulls in a ton of deps just to get the MiniKdc functionality. It would be nice if we could work with them to pare that down in the future. So, the suggested way of fixing this would be to move the test/src/test/java code into it's own project, and move the classes from test/src/test/java to project/src/main/java. The fact that these two jars share the org.apache.accumulo.test namespace has already been an issue, so I had to stop Sealing the test-jar. So, I would like to move these tests to a new project (it-test? test-it? cluster-test?), and maybe shade in the dependencies so invoking the map-reduce job is simpler. I've never used shading before, so I'm not sure if this would be prohibitive. Of course, the test jar would not be in the accumulo-bin.tar.gz file unless we use a specific profile. At first read, it seems like if we could make a mapreduce-it module (or something) that just shades in the dependencies and test classes you need. Maybe you could keep the existing accumulo-test jar off of the classpath and that would prevent you from having to unseal the existing accumulo-test jar. Not entirely sure... this is the time I usually would ask Christopher Tubbs
          Hide
          ecn Eric Newton added a comment -

          I asked everyone, via the dev list. I'm certainly going to talk to Christopher Tubbs, mostly because I would need help twisting maven into submission.

          Show
          ecn Eric Newton added a comment - I asked everyone, via the dev list. I'm certainly going to talk to Christopher Tubbs , mostly because I would need help twisting maven into submission.
          Hide
          elserj Josh Elser added a comment -

          I asked everyone, via the dev list. I'm certainly going to talk to Christopher Tubbs,

          I understand that. I was adding a caveat my recommendation.

          Show
          elserj Josh Elser added a comment - I asked everyone, via the dev list. I'm certainly going to talk to Christopher Tubbs, I understand that. I was adding a caveat my recommendation.
          Hide
          ctubbsii Christopher Tubbs added a comment -

          So, we're kind of already doing the "suggested way" already, by having our own test module. This is equivalent to a separate project with one variation: the stuff in src/test/java should be moved into src/main/java, which would also require moving test-scoped dependencies to compile (or runtime) scope.

          I think it'd be fine to just complete moving the src/test/java stuff in the test jar to src/main/java to make all this work well. For our tests, this would mean we'd have to direct maven-failsafe-plugin look for ITs in src/main/java instead of src/test/java. However, I don't think that would be difficult.

          It would also be convenient to have a shaded jar built in the test module (in a profile, please... not active by default).

          One thing we may want to also consider is whether we should be deploying the test module artifacts at all. We probably never really should have been doing this, but with the move of stuff from src/test/java to src/main/java, the resulting jar is going to be much larger (full of ITs that are completely useless outside the build/testing of accumulo itself), so we may want to disable this module entirely in the release profile, so it doesn't get deployed to maven central.

          Show
          ctubbsii Christopher Tubbs added a comment - So, we're kind of already doing the "suggested way" already, by having our own test module. This is equivalent to a separate project with one variation: the stuff in src/test/java should be moved into src/main/java, which would also require moving test-scoped dependencies to compile (or runtime) scope. I think it'd be fine to just complete moving the src/test/java stuff in the test jar to src/main/java to make all this work well. For our tests, this would mean we'd have to direct maven-failsafe-plugin look for ITs in src/main/java instead of src/test/java. However, I don't think that would be difficult. It would also be convenient to have a shaded jar built in the test module (in a profile, please... not active by default). One thing we may want to also consider is whether we should be deploying the test module artifacts at all. We probably never really should have been doing this, but with the move of stuff from src/test/java to src/main/java, the resulting jar is going to be much larger (full of ITs that are completely useless outside the build/testing of accumulo itself), so we may want to disable this module entirely in the release profile, so it doesn't get deployed to maven central.
          Hide
          elserj Josh Elser added a comment -

          so we may want to disable this module entirely in the release profile, so it doesn't get deployed to maven central.

          This was my thinking behind making an entirely new jar for Eric's mapreduce desires. We can make a jar specifically for this purpose, shade to his heart's content and not deploy it out. I don't think I'd be in favor of removing the accumulo-test from our deployment. It's just too much of a normal thing for Accumulo IMO – I'd be confused if it's in my installation but not available in central.

          Show
          elserj Josh Elser added a comment - so we may want to disable this module entirely in the release profile, so it doesn't get deployed to maven central. This was my thinking behind making an entirely new jar for Eric's mapreduce desires. We can make a jar specifically for this purpose, shade to his heart's content and not deploy it out. I don't think I'd be in favor of removing the accumulo-test from our deployment. It's just too much of a normal thing for Accumulo IMO – I'd be confused if it's in my installation but not available in central.
          Hide
          ecn Eric Newton added a comment -

          My goal is to do a build, maybe with a different profile, that will build this ugly, giant IT-laden jar, that can be part of my tar ball.

          I have scripts (from Keith Turner and Mike Walch's fluo-deploy ) that I can use to fire up a cluster at amazon. I would like to throw just one tarball to the test cluster. I should also have access to some bare-metal cluster(s), and having a single, built tar-ball would be kinda handy.

          So, we can either do it in a new jar, under a new package, or I can do it to the test jar. Either way, I expect to rebuild the the bin.tar.gz for my own testing. Well, actually my maven knowledge is limited, so I would need some help from Christopher Tubbs to get a solution.

          I don't agree that the only thing in test are ITs. The Random Walk framework is in there. If we want to run Random Walk as part of our tests, or support others writing new Random Walk tests, we need to deploy the test jar. I'm pretty sure the Continuous Ingest tests are in there, too.

          Show
          ecn Eric Newton added a comment - My goal is to do a build, maybe with a different profile, that will build this ugly, giant IT-laden jar, that can be part of my tar ball. I have scripts (from Keith Turner and Mike Walch 's fluo-deploy ) that I can use to fire up a cluster at amazon. I would like to throw just one tarball to the test cluster. I should also have access to some bare-metal cluster(s), and having a single, built tar-ball would be kinda handy. So, we can either do it in a new jar, under a new package, or I can do it to the test jar. Either way, I expect to rebuild the the bin.tar.gz for my own testing. Well, actually my maven knowledge is limited, so I would need some help from Christopher Tubbs to get a solution. I don't agree that the only thing in test are ITs. The Random Walk framework is in there. If we want to run Random Walk as part of our tests, or support others writing new Random Walk tests, we need to deploy the test jar. I'm pretty sure the Continuous Ingest tests are in there, too.
          Hide
          ctubbsii Christopher Tubbs added a comment -

          I don't agree that the only thing in test are ITs.

          Yeah, I wasn't really thinking about the rest. If we expect people to still be able to run CI and RW, etc., from the tarball, it might make sense to deploy it (even if it does now include some extra ITs).

          Show
          ctubbsii Christopher Tubbs added a comment - I don't agree that the only thing in test are ITs. Yeah, I wasn't really thinking about the rest. If we expect people to still be able to run CI and RW, etc., from the tarball, it might make sense to deploy it (even if it does now include some extra ITs).
          Hide
          elserj Josh Elser added a comment -

          I see you committed a change to master which moved all of the integration tests from src/test/java to src/main/java. Can I ask why you did that? It seemed like the approach you were heading towards was creating a special fatjar with all of the stuff you needed to avoid libjars taking forever. It's not apparent to me why you needed to move the ITs to src/main as you could have just used the accumulo-test-$version-tests.jar. I'm concerned about not having our test code in the normal location to just support something that is not going to be generally used by everyone (most people won't have many nodes to run the ITs via mapreduce)

          Thanks.

          Show
          elserj Josh Elser added a comment - I see you committed a change to master which moved all of the integration tests from src/test/java to src/main/java. Can I ask why you did that? It seemed like the approach you were heading towards was creating a special fatjar with all of the stuff you needed to avoid libjars taking forever. It's not apparent to me why you needed to move the ITs to src/main as you could have just used the accumulo-test-$version-tests.jar. I'm concerned about not having our test code in the normal location to just support something that is not going to be generally used by everyone (most people won't have many nodes to run the ITs via mapreduce) Thanks.
          Hide
          ctubbsii Christopher Tubbs added a comment -

          This link describes the reason for the transition, under "the preferred way".

          Show
          ctubbsii Christopher Tubbs added a comment - This link describes the reason for the transition, under "the preferred way".
          Hide
          elserj Josh Elser added a comment -

          Thanks for the link. I missed that point from Eric's original point.

          Any suggestions on how can we prevent ITs from showing up against in test/src/test/java? Checkstyle rule? Developer section in the user manual? Email?

          Show
          elserj Josh Elser added a comment - Thanks for the link. I missed that point from Eric's original point. Any suggestions on how can we prevent ITs from showing up against in test/src/test/java? Checkstyle rule? Developer section in the user manual? Email?
          Hide
          ctubbsii Christopher Tubbs added a comment -

          I'd have to look into that. Right now, the best I can think is a commit-hook or custom exec-maven-plugin execution. However, there are valid reasons for ITs to exist in that section (though currently failsafe won't execute them, I think): that is, ITs which test the test frameworks themselves (RW, CI, etc.)

          Show
          ctubbsii Christopher Tubbs added a comment - I'd have to look into that. Right now, the best I can think is a commit-hook or custom exec-maven-plugin execution. However, there are valid reasons for ITs to exist in that section (though currently failsafe won't execute them, I think): that is, ITs which test the test frameworks themselves (RW, CI, etc.)
          Hide
          ctubbsii Christopher Tubbs added a comment -

          Looking at the commits so far, I'm curious if we can't just put the IntegrationTestMapReduce class in the test jar, and have a profile in that which creates a shaded jar for testing.

          The advantage of this is primarily:

          1. one less module (simplicity)
          2. no publishing a shaded jar to central on release (avoids potential pitfalls for users encountering these jars)
          Show
          ctubbsii Christopher Tubbs added a comment - Looking at the commits so far, I'm curious if we can't just put the IntegrationTestMapReduce class in the test jar, and have a profile in that which creates a shaded jar for testing. The advantage of this is primarily: one less module (simplicity) no publishing a shaded jar to central on release (avoids potential pitfalls for users encountering these jars)
          Hide
          ctubbsii Christopher Tubbs added a comment -

          There's also the issue of faster normal builds (creating shaded jar takes time).

          Show
          ctubbsii Christopher Tubbs added a comment - There's also the issue of faster normal builds (creating shaded jar takes time).
          Hide
          ecn Eric Newton added a comment - - edited

          It's in an intermediate form. I have no intention of building the Big Jar with every build. If you want to wrangle maven into a better form, please do. That was a big reason why I checked it in, as-is.

          Show
          ecn Eric Newton added a comment - - edited It's in an intermediate form. I have no intention of building the Big Jar with every build. If you want to wrangle maven into a better form, please do. That was a big reason why I checked it in, as-is.
          Hide
          ctubbsii Christopher Tubbs added a comment -

          Okay, cool. I'll take a look and see what I can help with.

          Show
          ctubbsii Christopher Tubbs added a comment - Okay, cool. I'll take a look and see what I can help with.
          Hide
          ctubbsii Christopher Tubbs added a comment -

          Eric Newton, I rolled the mrit module into the test module as an optional second artifact (shaded), so it won't build by default.

          Show
          ctubbsii Christopher Tubbs added a comment - Eric Newton , I rolled the mrit module into the test module as an optional second artifact (shaded), so it won't build by default.
          Hide
          ecn Eric Newton added a comment -

          +1, thanks Christopher Tubbs.

          Show
          ecn Eric Newton added a comment - +1, thanks Christopher Tubbs .
          Hide
          ctubbsii Christopher Tubbs added a comment -

          Instead of those if statements to conditionally skip tests under mrit, you could use junit's Assume controls. That way, the test reports will indicate the test was skipped instead of that the test passed successfully. It's a minor thing though.

          Show
          ctubbsii Christopher Tubbs added a comment - Instead of those if statements to conditionally skip tests under mrit, you could use junit's Assume controls. That way, the test reports will indicate the test was skipped instead of that the test passed successfully. It's a minor thing though.
          Hide
          ecn Eric Newton added a comment -

          Good idea! Thanks.

          Show
          ecn Eric Newton added a comment - Good idea! Thanks.
          Hide
          ecn Eric Newton added a comment -

          Two performance tests are failing, but I believe that's due to the performance of the ec2 nodes I've been using to scale up the test.

          Show
          ecn Eric Newton added a comment - Two performance tests are failing, but I believe that's due to the performance of the ec2 nodes I've been using to scale up the test.
          Hide
          ecn Eric Newton added a comment -

          Comments in IntegrationTestMapReduce document how to run the IT's with map/reduce.

          Show
          ecn Eric Newton added a comment - Comments in IntegrationTestMapReduce document how to run the IT's with map/reduce.

            People

            • Assignee:
              ecn Eric Newton
              Reporter:
              ecn Eric Newton
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 4h 20m
                4h 20m

                  Development