Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.0.0
    • Fix Version/s: None
    • Component/s: scripts, start
    • Labels:
      None

      Description

      While there are some Accumulo images on DockerHub, it looks the majority of the them are designed to run a single-node Accumulo instance in a Docker container for development and testing.

      It would be great if Accumulo had an official image for running Accumulo processes in containers on a production cluster. The image could be be published as an official image 'apache/accumulo' to DockerHub.

      In order to make this possible, I think work needs to be done to allow configuration to be passed to the Accumulo process in the docker container without using configuration files (as passing files to a running container is hard in Docker). One way to do this is to add an option called --upload-accumulo-site to 'accumulo init' command which is called outside of Docker by the user. This would set properties in accumulo-site.xml as system properties in Zookeeper during Accumulo initialization. Accumulo processes in Docker containers could be started with minimal configuration by updating 'accumulo <service>' commands to have a -o key=value option to override configuration. These changes to Accumulo would enable the following commands to start an Accumulo cluster in Docker:

      accumulo init --upload-accumulo-site
      docker pull apache/accumulo
      docker run apache/accumulo master -o instance.zookeeper.host=zkhost:2181
      docker run apache/accumulo tserver -o instance.zookeeper.host=zkhost:2181
      docker run apache/accumulo monitor -o instance.zookeeper.host=zkhost:2181
      docker run apache/accumulo tracer -o instance.zookeeper.host=zkhost:2181
      

        Activity

        Hide
        mjwall Michael Wall added a comment -

        What are you thoughts for running HDFS and Zookeeper? Seems like you expect these to be running already. Can there ever be data locality?

        Show
        mjwall Michael Wall added a comment - What are you thoughts for running HDFS and Zookeeper? Seems like you expect these to be running already. Can there ever be data locality?
        Hide
        mikewalch Mike Walch added a comment -

        The image is designed for Hadoop & Zookeeper to be running already. While I am sure it is possible to run HDFS & Zookeeper in Docker, it's a lot more complicated than stateless applications like Accumulo. If someone ever comes up a with a good Hadoop or Zookeeper image for Docker, I would consider it but until then I would just run them outside of Docker.

        Show
        mikewalch Mike Walch added a comment - The image is designed for Hadoop & Zookeeper to be running already. While I am sure it is possible to run HDFS & Zookeeper in Docker, it's a lot more complicated than stateless applications like Accumulo. If someone ever comes up a with a good Hadoop or Zookeeper image for Docker, I would consider it but until then I would just run them outside of Docker.
        Hide
        ctubbsii Christopher Tubbs added a comment -

        In the past, we had RPMs and DEBs built upstream as "official" packaging of Accumulo. We ended up abandoning those "official" RPMs and DEBs, and stripped out downstream packaging elements, in favor of delegating that maintenance to the proper downstream communities (if they desired to pick it up).

        There is an analogy here with Docker. I see Docker as another downstream packaging effort. That said, I think there are some important differences with Docker that differentiate it from the RPM/DEB packaging that we did before.

        • First, it's clear that the way this would happen is as a separate, sibling module from the main upstream Accumulo build. This is different from what we did before, by baking in the RPM/DEB builds into our main artifact builds. Keeping this as a separate git repo, which depends on the main build is a great idea.
        • Second, unlike RPM/DEB, which are tightly coupled to the downstream packaging conventions of FedoraDocker is somewhat independent of any particular downstream environment. It makes sense that if a community of people want to work on that (even if that is just one person), that we could host that effort, since there's no other logical place for it to exist.

        Overall, I'm in favor of this effort, and support it as long as there is somebody interested in doing the work.

        As for the idea of uploading config to ZK during init, I think that's sensible. I think we could hijack the existing "SystemConfiguration" mechanism to push configuration there. We'll need a bootstrap mechanism to get these services connected to ZK in the first place, though. For that, I think we should seriously consider revamping our configuration mechanisms, to use commons-configuration2's CompositeConfiguration with the composition being: accumulo-site.properties file, overridden by java system properties, which we can set on the command-line. Then, we can simply inject the command line args for the system properties which provide ZK connection details into the workers. I've wanted this change (or something like it) for a while, and I think it would easily help with this case.

        Show
        ctubbsii Christopher Tubbs added a comment - In the past, we had RPMs and DEBs built upstream as "official" packaging of Accumulo. We ended up abandoning those "official" RPMs and DEBs, and stripped out downstream packaging elements, in favor of delegating that maintenance to the proper downstream communities (if they desired to pick it up). There is an analogy here with Docker. I see Docker as another downstream packaging effort. That said, I think there are some important differences with Docker that differentiate it from the RPM/DEB packaging that we did before. First, it's clear that the way this would happen is as a separate, sibling module from the main upstream Accumulo build. This is different from what we did before, by baking in the RPM/DEB builds into our main artifact builds. Keeping this as a separate git repo, which depends on the main build is a great idea. Second, unlike RPM/DEB, which are tightly coupled to the downstream packaging conventions of FedoraDocker is somewhat independent of any particular downstream environment. It makes sense that if a community of people want to work on that (even if that is just one person), that we could host that effort, since there's no other logical place for it to exist. Overall, I'm in favor of this effort, and support it as long as there is somebody interested in doing the work. As for the idea of uploading config to ZK during init, I think that's sensible. I think we could hijack the existing "SystemConfiguration" mechanism to push configuration there. We'll need a bootstrap mechanism to get these services connected to ZK in the first place, though. For that, I think we should seriously consider revamping our configuration mechanisms, to use commons-configuration2's CompositeConfiguration with the composition being: accumulo-site.properties file, overridden by java system properties, which we can set on the command-line. Then, we can simply inject the command line args for the system properties which provide ZK connection details into the workers. I've wanted this change (or something like it) for a while, and I think it would easily help with this case.

          People

          • Assignee:
            mikewalch Mike Walch
            Reporter:
            mikewalch Mike Walch
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development