Accumulo
  1. Accumulo
  2. ACCUMULO-1386 make it easy to run a single node accumulo instance
  3. ACCUMULO-1405

Package MiniAccumuloCluster so that a user can interact without Hadoop/ZooKeeper installed

    Details

    • Type: Sub-task Sub-task
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 1.7.0
    • Component/s: mini
    • Labels:

      Description

      Accumulo's packaging is currently very lightweight because many libraries, like Apache Commons, are being pulled from Hadoop & ZooKeeper's classpaths. It will not allow the MAC to be run from accumulo-start, however, without Hadoop and ZooKeeper installed.

        Issue Links

          Activity

          Hide
          Corey J. Nolet added a comment -

          I'm almost thinking, with the lack of having the necessary dependencies on the classpath to run Accumulo without Hadoop/Zookeeper installed on the system, maybe it just makes sense to provide another tarball with the cluster packaged up with a script to get it up and running quickly. I'm thinking this tarball would have all of the necessary dependencies packaged up inside.

          Show
          Corey J. Nolet added a comment - I'm almost thinking, with the lack of having the necessary dependencies on the classpath to run Accumulo without Hadoop/Zookeeper installed on the system, maybe it just makes sense to provide another tarball with the cluster packaged up with a script to get it up and running quickly. I'm thinking this tarball would have all of the necessary dependencies packaged up inside.
          Hide
          Josh Elser added a comment -

          This seems to be overlapping a bit with what Keith Turner did the first time around with Instamo.

          There's a lot of blocks already in place with the Instamo stuff. We might be able to add some smarts into it to make a tarball too (instead of just the maven archetype)? Thoughts?

          Show
          Josh Elser added a comment - This seems to be overlapping a bit with what Keith Turner did the first time around with Instamo. There's a lot of blocks already in place with the Instamo stuff. We might be able to add some smarts into it to make a tarball too (instead of just the maven archetype)? Thoughts?
          Hide
          David Medinets added a comment -

          Corey, that's the approach I took with my accumulo_stackscript and accumulo-at-home github projects. Let's not make more 'official' work than needed. If someone wants to freeze an environment, create a github project and go to town. Otherwise, we'll been accumulating dependency objects over time.

          Show
          David Medinets added a comment - Corey, that's the approach I took with my accumulo_stackscript and accumulo-at-home github projects. Let's not make more 'official' work than needed. If someone wants to freeze an environment, create a github project and go to town. Otherwise, we'll been accumulating dependency objects over time.
          Hide
          Corey J. Nolet added a comment -

          This conversation stems from ACCUMULO-1368 where there was discussion of using the MAC for a single node Accumulo and possibly packaging it for use in the accumulo-start script. It would be nice to continue that and figure out the best way to package and present it for quick and simple startup. If the maven plugin is the only way, I can live with that. Keith and myself both seemed to agree that it would be nice if there was another way (without maven) to get it started as well.

          Show
          Corey J. Nolet added a comment - This conversation stems from ACCUMULO-1368 where there was discussion of using the MAC for a single node Accumulo and possibly packaging it for use in the accumulo-start script. It would be nice to continue that and figure out the best way to package and present it for quick and simple startup. If the maven plugin is the only way, I can live with that. Keith and myself both seemed to agree that it would be nice if there was another way (without maven) to get it started as well.
          Hide
          Keith Turner added a comment -

          There's a lot of blocks already in place with the Instamo stuff.

          I would not say the blocks are in place at this point in time. Maybe after ACCUMULO-1378 is complete, instamo could leverage that work. When a lot of the tickets under ACCUMULO-1386 are complete and its easy to run a single node accumulo, how do we package that and make it available to users? Keep in mind that users will want to stop and start this instance, its not transient. Users will also want to configure it. Maybe maven is a way to do this, I am not really sure at this point. Seems like it would be something different than instamo, maybe a maven plugin for managing a single node accumulo instance. Will this work w/ native maps? Another option is creating a tarball, rpm, and deb. I think it would be useful to develop a list of the pros and cons of each approach. I agree w/ David Medinets we do want to be cognizant of the maintenance tail of whatever we create. We will have to continue to support in 1.7, etc.

          Show
          Keith Turner added a comment - There's a lot of blocks already in place with the Instamo stuff. I would not say the blocks are in place at this point in time. Maybe after ACCUMULO-1378 is complete, instamo could leverage that work. When a lot of the tickets under ACCUMULO-1386 are complete and its easy to run a single node accumulo, how do we package that and make it available to users? Keep in mind that users will want to stop and start this instance, its not transient. Users will also want to configure it. Maybe maven is a way to do this, I am not really sure at this point. Seems like it would be something different than instamo, maybe a maven plugin for managing a single node accumulo instance. Will this work w/ native maps? Another option is creating a tarball, rpm, and deb. I think it would be useful to develop a list of the pros and cons of each approach. I agree w/ David Medinets we do want to be cognizant of the maintenance tail of whatever we create. We will have to continue to support in 1.7, etc.
          Hide
          Keith Turner added a comment -

          I thought of another possible option instead of specialized packaging. Include a script (maybe its just a maven command) that pulls down the needed dependencies into the lib dir. So the user would be able to do the following.

          • untar accumulo-1.6.0-bin.tar.gz
          • cd accumulo-1.6.0
          • run script to download hadoop, zookeeper, etc jars into lib dir
          • ./bin/accumulo mini -p propsfile

          I think this would be easy for the user and easy for us.

          Show
          Keith Turner added a comment - I thought of another possible option instead of specialized packaging. Include a script (maybe its just a maven command) that pulls down the needed dependencies into the lib dir. So the user would be able to do the following. untar accumulo-1.6.0-bin.tar.gz cd accumulo-1.6.0 run script to download hadoop, zookeeper, etc jars into lib dir ./bin/accumulo mini -p propsfile I think this would be easy for the user and easy for us.
          Hide
          Corey J. Nolet added a comment -

          I just did an example assembly that copied all the dependencies (including provided) and packaged up the MAC as a shade jar. It was 32mb and that doesn't seem realistic to release inside the main tarball (which is... 11mb). I like the idea of having developers pull down the dependencies on their own using a script in the tarball. We've also mentioned non-developers using it to populate some data quickly for (possible) migration over to a fully-distributed cloud. In that latter case, can we guarantee that their system have maven installed or even connectivity to the internet?

          I know there must be downsides as well, but what if a user could do this:

          • untar accumulo-mini-1.6.0-bin.tar.gz
          • cd accumulo-mini-1.6.0
          • ./bin/accumulo-mini -p propsFile
          Show
          Corey J. Nolet added a comment - I just did an example assembly that copied all the dependencies (including provided) and packaged up the MAC as a shade jar. It was 32mb and that doesn't seem realistic to release inside the main tarball (which is... 11mb). I like the idea of having developers pull down the dependencies on their own using a script in the tarball. We've also mentioned non-developers using it to populate some data quickly for (possible) migration over to a fully-distributed cloud. In that latter case, can we guarantee that their system have maven installed or even connectivity to the internet? I know there must be downsides as well, but what if a user could do this: untar accumulo-mini-1.6.0-bin.tar.gz cd accumulo-mini-1.6.0 ./bin/accumulo-mini -p propsFile
          Hide
          Corey J. Nolet added a comment -

          Keith Turner, I've been thinking about your latest idea of pulling down a bunch of dependencies instead of making separate packaging. I'm swaying in that direction currently. Seems like it'd be useful to have a lib/deps directory that could be removed easily just in case the user wants to run Accumulo against (possibly different version of) hadoop/zk. Maybe the script could pull down everything and filter any duplicates against the deps in lib/

          Show
          Corey J. Nolet added a comment - Keith Turner , I've been thinking about your latest idea of pulling down a bunch of dependencies instead of making separate packaging. I'm swaying in that direction currently. Seems like it'd be useful to have a lib/deps directory that could be removed easily just in case the user wants to run Accumulo against (possibly different version of) hadoop/zk. Maybe the script could pull down everything and filter any duplicates against the deps in lib/
          Hide
          Keith Turner added a comment -

          Corey J. Nolet, I have been thinking about a accumulo-mini-1.6.0-bin.tar.gz distro, from the perspective of a user. Why can't I run ./bin/start-all.sh and ./bin/stop-all.sh? Why don't I run init? Why do I configure tserver memory in a different place (props file vs accumulo-env.sh)? These are rhetorical questions. It seems that w/o working on the scripts (which I am not advocating for), this alternate distribution would present an entirely different experience. It also made me think about the monitor and gc, MAC does not start these. Do not really need to for unit test. Probably should start them for the single node use case.

          A maven plugin to start and stop a single node Accumulo instance and script in contrib to pull down deps into lib/deps seem like a good way to go.

          Show
          Keith Turner added a comment - Corey J. Nolet , I have been thinking about a accumulo-mini-1.6.0-bin.tar.gz distro, from the perspective of a user. Why can't I run ./bin/start-all.sh and ./bin/stop-all.sh? Why don't I run init? Why do I configure tserver memory in a different place (props file vs accumulo-env.sh)? These are rhetorical questions. It seems that w/o working on the scripts (which I am not advocating for), this alternate distribution would present an entirely different experience. It also made me think about the monitor and gc, MAC does not start these. Do not really need to for unit test. Probably should start them for the single node use case. A maven plugin to start and stop a single node Accumulo instance and script in contrib to pull down deps into lib/deps seem like a good way to go.
          Hide
          Corey J. Nolet added a comment -

          I was actually quite surprised to find out that running the Hadoop mini cluster from the command line doesn't try to replicate how a Hadoop fully-distributed cloud is configured. I think moving the mini cluster to it's own module (ACCUMULO-1438) is a step in the right direction.

          Keith Turner, I'm in agreement that if the goal of having another artifact is to have a familiar and consistent interface presented to the user such that configuring & running a "mini accumulo cluster" appears to be no different from running a fully-distributed cluster. I'm not exactly sold that they need to be exactly the same, though, because they aren't. I agree that having another top-level tarball would not be the answer. It would just be something else that would need to be maintained & released. Perhaps something as simple as a shaded jar (that includes Hadoop/ZK dependencies) could get the job done.

          Show
          Corey J. Nolet added a comment - I was actually quite surprised to find out that running the Hadoop mini cluster from the command line doesn't try to replicate how a Hadoop fully-distributed cloud is configured. I think moving the mini cluster to it's own module ( ACCUMULO-1438 ) is a step in the right direction. Keith Turner , I'm in agreement that if the goal of having another artifact is to have a familiar and consistent interface presented to the user such that configuring & running a "mini accumulo cluster" appears to be no different from running a fully-distributed cluster. I'm not exactly sold that they need to be exactly the same, though, because they aren't. I agree that having another top-level tarball would not be the answer. It would just be something else that would need to be maintained & released. Perhaps something as simple as a shaded jar (that includes Hadoop/ZK dependencies) could get the job done.
          Hide
          Corey J. Nolet added a comment -

          I've been leaning towards a script or pom that can do a copy-dependencies for hadoop/zookeeper so that the MAC can run without having them installed.

          Show
          Corey J. Nolet added a comment - I've been leaning towards a script or pom that can do a copy-dependencies for hadoop/zookeeper so that the MAC can run without having them installed.
          Hide
          Christopher Tubbs added a comment -

          I'm wondering what this solves that a "run" mojo on accumulo-maven-plugin wouldn't (see also "mvn jetty:run")

          Show
          Christopher Tubbs added a comment - I'm wondering what this solves that a "run" mojo on accumulo-maven-plugin wouldn't (see also "mvn jetty:run")
          Hide
          Corey J. Nolet added a comment -

          Now that you mention it, it doesn't. I would love to have a "mvn accumulo:run" mojo. That would be a HUGE win. Can we achieve this with your current StartMojo/StopMojo or can we plug in the MiniAccumuloRunner to kill two birds with one stone? Do we have a ticket for it yet?

          Show
          Corey J. Nolet added a comment - Now that you mention it, it doesn't. I would love to have a "mvn accumulo:run" mojo. That would be a HUGE win. Can we achieve this with your current StartMojo/StopMojo or can we plug in the MiniAccumuloRunner to kill two birds with one stone? Do we have a ticket for it yet?
          Hide
          Christopher Tubbs added a comment -

          I was hoping to link it to MiniAccumuloRunner (actually, I was hoping the plugin would supercede both MiniAccumuloCluster and MiniAccumuloRunner eventually). I'm not sure if there's already a ticket for the RunMojo. I looked into it at some point, but it fell off my radar.

          Show
          Christopher Tubbs added a comment - I was hoping to link it to MiniAccumuloRunner (actually, I was hoping the plugin would supercede both MiniAccumuloCluster and MiniAccumuloRunner eventually). I'm not sure if there's already a ticket for the RunMojo. I looked into it at some point, but it fell off my radar.
          Hide
          Corey J. Nolet added a comment -

          It would be extremely useful if all of the properties that configure the MAC could be dumped to a single place somewhere- either the accumulo-site.xml or just dump the property file that is passed into the MiniAccumuloRunner.

          Show
          Corey J. Nolet added a comment - It would be extremely useful if all of the properties that configure the MAC could be dumped to a single place somewhere- either the accumulo-site.xml or just dump the property file that is passed into the MiniAccumuloRunner.

            People

            • Assignee:
              Corey J. Nolet
              Reporter:
              Corey J. Nolet
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Development