Hadoop Common
  1. Hadoop Common
  2. HADOOP-7977

Allow Hadoop clients and services to run in an OSGi container

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.24.0
    • Fix Version/s: None
    • Component/s: util
    • Labels:
      None
    • Environment:

      OSGi client runtime (Spring &c), possibly service runtime (e.g. Apache Karaf)

      Description

      There's been past discussion on running Hadoop client and service code in OSGi. This JIRA issue exists to wrap up the needs and issues.

      1. client-side use of public Hadoop APIs would seem most important.
      2. service-side deployments could offer benefits. The non-standard Hadoop Java security configuration may interfere with this goal.
      3. testing would all be functional with dependencies on external services, to make things harder.

        Issue Links

          Activity

          Hide
          Jean-Baptiste Onofré added a comment -

          I have something ready for that (patch will be submitted today), including the features for Apache Karaf.

          Is someone can assign this Jira to me ?

          Show
          Jean-Baptiste Onofré added a comment - I have something ready for that (patch will be submitted today), including the features for Apache Karaf. Is someone can assign this Jira to me ?
          Hide
          Steve Loughran added a comment -

          linking to the issue I found w.r.t IPC and the security manager. This may be invalid due to all the IPC changes since then

          Show
          Steve Loughran added a comment - linking to the issue I found w.r.t IPC and the security manager. This may be invalid due to all the IPC changes since then
          Hide
          Steve Loughran added a comment -

          To answer Sanjay's questions in HADOOP-6484:

          Benefits and target audience: Is this work targeted for managing/running hadoop for developers or for production use? Briefly describe the benefits.
          1. it could be used for better deployment/management of the services in small clusters, where the memory requirements of the NN and JT aren't great, and being able to deploy in a single process the entire set of services for a worker node or a (single) master node would result in a lighter system load.
          1. If the TT started (marked) tasks within the OSGi container (or a preloaded peer OSGi container), Map and Reduce jobs would be able to execute without all the JVM startup delays.
          Besides adding the manifests to jar files will it require adding more invasive changes such as special interfaces for stopping and starting hadoop daemons?
          • Adding the headers will have no impact on the existing daemons, because they don't run in an OSGi container.
          • Nor does any of the Hadoop code play games with classloaders, which is one thing that OSGi does differently.
          • HADOOP-5731 shows a problem which existed when trying to run IPC under a security manager; this may be a barrier to OSGI Container use. If it exists client-side that is something that may need fixing anyway, if it is still there after a switch to protobuf everywhere.
          • the MRv2 service model could be re-used by some OSGi helper code that could manage the lifecycle of things, because you no longer need per-service code to start/stop services.
          • I'd expect there to be some new entry points needed to start the services under OSGi, but they should be wrapper layers on the existing code. If they depended on OSGi services they could be off to one side; if they needed to be in the same package as existing stuff things might get trickier.
          Will this be used for management after deployment has been done through some other mechanism or will this work also enable the deployment in a cluster?
          • Karaf is interesting in that not only is it yet-another-OSGi container, it is one that has a built in SSHD, so anyone can ssh in remotely, authenticate themselves and issue management commands: start/stop services, see logs, etc:
            http://felix.apache.org/site/41-console-and-commands.html -one that works on Windows too, which doesn't normally ship with an sshd.
          • I wonder if you get at the logs through karaf -including any from jobs stored on the workers? That would be useful.
          • Karaf itself doesn't do remote deployment, AFAIK. Bringing up a zookeeper client on each karaf instance and waiting for instructions via ZK could always be possible.

          Overall, I think it could be good, adding the headers is low risk, other features could be useful, though it will take some work to see what problems arise.

          Show
          Steve Loughran added a comment - To answer Sanjay's questions in HADOOP-6484 : Benefits and target audience: Is this work targeted for managing/running hadoop for developers or for production use? Briefly describe the benefits. it could be used for better deployment/management of the services in small clusters , where the memory requirements of the NN and JT aren't great, and being able to deploy in a single process the entire set of services for a worker node or a (single) master node would result in a lighter system load. If the TT started (marked) tasks within the OSGi container (or a preloaded peer OSGi container), Map and Reduce jobs would be able to execute without all the JVM startup delays. Besides adding the manifests to jar files will it require adding more invasive changes such as special interfaces for stopping and starting hadoop daemons? Adding the headers will have no impact on the existing daemons, because they don't run in an OSGi container. Nor does any of the Hadoop code play games with classloaders, which is one thing that OSGi does differently. HADOOP-5731 shows a problem which existed when trying to run IPC under a security manager; this may be a barrier to OSGI Container use. If it exists client-side that is something that may need fixing anyway, if it is still there after a switch to protobuf everywhere. the MRv2 service model could be re-used by some OSGi helper code that could manage the lifecycle of things, because you no longer need per-service code to start/stop services. I'd expect there to be some new entry points needed to start the services under OSGi, but they should be wrapper layers on the existing code. If they depended on OSGi services they could be off to one side; if they needed to be in the same package as existing stuff things might get trickier. Will this be used for management after deployment has been done through some other mechanism or will this work also enable the deployment in a cluster? Karaf is interesting in that not only is it yet-another-OSGi container, it is one that has a built in SSHD, so anyone can ssh in remotely, authenticate themselves and issue management commands: start/stop services, see logs, etc: http://felix.apache.org/site/41-console-and-commands.html -one that works on Windows too, which doesn't normally ship with an sshd. I wonder if you get at the logs through karaf -including any from jobs stored on the workers? That would be useful. Karaf itself doesn't do remote deployment, AFAIK. Bringing up a zookeeper client on each karaf instance and waiting for instructions via ZK could always be possible. Overall, I think it could be good, adding the headers is low risk, other features could be useful, though it will take some work to see what problems arise.
          Hide
          Jean-Baptiste Onofré added a comment -

          I provided a first patch for the OSGi statements in the Hadoop Common MANIFEST files.

          The second step is to use blueprint to register Hadoop part as OSGi services.

          Show
          Jean-Baptiste Onofré added a comment - I provided a first patch for the OSGi statements in the Hadoop Common MANIFEST files. The second step is to use blueprint to register Hadoop part as OSGi services.
          Hide
          Guillaume Nodet added a comment -

          Deploying Hadoop in Karaf kinda require HADOOP-8572 for ease of use at dev time.

          Show
          Guillaume Nodet added a comment - Deploying Hadoop in Karaf kinda require HADOOP-8572 for ease of use at dev time.
          Hide
          Kihwal Lee added a comment -

          Do you have any jira/patch that adds the hadoop-karaf module?

          Show
          Kihwal Lee added a comment - Do you have any jira/patch that adds the hadoop-karaf module?

            People

            • Assignee:
              Unassigned
              Reporter:
              Steve Loughran
            • Votes:
              2 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:

                Development