Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I have been running some tests getting hadoop to run within an osgi environment (specifically the Newton framework) and this has uncovered a number of minor bugs when mapred classes are instantiated from a different start point than their main methods.

      I have created a number of patches which I'll attach which solve these issues. It's possible these patches could be dealt with as separate issues but all are required to resolve the osgi issue. Happy to split up if easier to manage though.

      classpath.patch: this rearranges the classloader hierarchies for Task objects such that a Task is able to resolve api classes in the case where the api classes are no longer loaded from the system classloader.

      tasklog.patch: this ensures the log files are able to be resolved in the case where the child process is launched from a different directory to the parent process

      taskrunner.patch: this enables the TaskRunner to find a log dir in the case where the parent jvm is not launched by the hadoop scripts, also allows for a client to specify a substitute main class (which delegates to the TaskTracker$Child) in this case for purposes of resolving osgi classpaths but could be more general? Finally adds some extra logging in case where things go wrong.

      tasktracker.patch: allow parent to pass through configuration to child taskrunner (specifically in this case for purposes of passing classpath and laucher to taskrunner)

      1. taskrunner.patch
        2 kB
        David Savage
      2. tasklog.patch
        0.8 kB
        David Savage
      3. classpath.patch
        11 kB
        David Savage
      4. tasktracker.patch
        0.9 kB
        David Savage

        Issue Links

          Activity

          Hide
          Jean-Baptiste Onofré added a comment -

          I'm gonna submit a new set of patches, including Karaf features.

          Show
          Jean-Baptiste Onofré added a comment - I'm gonna submit a new set of patches, including Karaf features.
          Hide
          Christophe Taton added a comment -

          Embedded web applications will need to be packaged as war files, so as to have Jetty6/OSGi correctly running: Jetty is only able to use OSGi specific URLs when reading a jar file (thus a war file).

          Show
          Christophe Taton added a comment - Embedded web applications will need to be packaged as war files, so as to have Jetty6/OSGi correctly running: Jetty is only able to use OSGi specific URLs when reading a jar file (thus a war file).
          Hide
          Christophe Taton added a comment -

          It seems there is no easy way to have Jetty5 running inside an OSGi container (more exactly, I did not manage to have it working after a couple of days spent debugging it).
          However Jetty6 runs without problems in an OSGi environment.

          Show
          Christophe Taton added a comment - It seems there is no easy way to have Jetty5 running inside an OSGi container (more exactly, I did not manage to have it working after a couple of days spent debugging it). However Jetty6 runs without problems in an OSGi environment.
          Hide
          Christophe Taton added a comment -

          After playing with Hadoop inside OSGi containers for some time, here are some complementary comments:

          • there is an issue with the web UI: this because resources inside Hadoop jars are referred to with OSGi specific URLs (e.g. jar:bundle://<bundle-id>/path/to/resource) that the embedded Jetty is unable to use.
          • i am thinking Map/Reduce jobs could be packaged as OSGi bundles too: dependencies (like 3rd party libraries) are then directly handled by the containers.
          Show
          Christophe Taton added a comment - After playing with Hadoop inside OSGi containers for some time, here are some complementary comments: there is an issue with the web UI: this because resources inside Hadoop jars are referred to with OSGi specific URLs (e.g. jar:bundle://<bundle-id>/path/to/resource) that the embedded Jetty is unable to use. i am thinking Map/Reduce jobs could be packaged as OSGi bundles too: dependencies (like 3rd party libraries) are then directly handled by the containers.
          Hide
          Doug Cutting added a comment -

          A single patch for this is probably best. Some comments:

          • indentation is not Hadoop standard (2-spaces per level)
          • non-existent files in the classpath should not throw exceptions, should they?
          • some unit tests would be good to ensure that these changes are maintained
          • patches should not include patch-specific comments
          • i don't like modifying the child's job configuration. can't this be implemented by using 'final' parameters in the tasktracker's configuration, so that job's cannot override them?
          Show
          Doug Cutting added a comment - A single patch for this is probably best. Some comments: indentation is not Hadoop standard (2-spaces per level) non-existent files in the classpath should not throw exceptions, should they? some unit tests would be good to ensure that these changes are maintained patches should not include patch-specific comments i don't like modifying the child's job configuration. can't this be implemented by using 'final' parameters in the tasktracker's configuration, so that job's cannot override them?

            People

            • Assignee:
              Jean-Baptiste Onofré
              Reporter:
              David Savage
            • Votes:
              3 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:

                Development