Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-913

dynamically loading C++ mapper/reducer classes in map/reduce jobs

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • New Feature
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • 0.11.0
    • None
    • None


      It is highly desirable for the current map/reduce framework to be able to call functions in c++ (or other languages).

      I am proposing a generic entension to the current framework to achieve the above goal.
      The extension is an application level solution, similar to
      HadoopStreaming in spirit, thus does not have impact on Hadoop core.
      I will maintain the native map/reduce execution model.

      The basic idea is to use socket/rpc to go through the language barrier.
      In particular, we can implement a generic mapper/reducer class in Java as a proxy for calling functions in other language.
      The configure function of the class will create a process that will open a user specified shared lirary act as an RPC server.
      The map function of the class will just invoke an RPC call the key/value pair.
      Such an RPC call is expected to return a list of key/value pairs. The map function then can emit the outputs.
      The below is a sketch for the generic class:

      public class MapRedCPPAdapter implements Mapper, Reducer {
      String sharedLibraryName;
      RPCProxy theServer;


      public void configure(JobConf job)

      { sharedLibraryName = job.get("shared.lib.name"); theServer = createServer(sharedLibraryName ); }

      public void close()

      { theServer.stop(); }

      public void map(key, value, output, repoter)

      { ArrayList pairs = invokeRemoteMap(theServer, key, value); emit(pairs) }

      public void reduce (key, values, output, reporter)

      { ArrayList pairs = invokeRemoteReduce(theServer, key, value); emit(pairs) }


      The cons of this approach include are the overhead associated with
      RPC calls and creating an additional process per mapper/reducer task.
      The pros are thhat the extension is clean, generic, simple. It is applicable to other foreign languages too.



          This comment will be Viewable by All Users Viewable by All Users


            Unassigned Unassigned
            runping Runping Qi
            0 Vote for this issue
            1 Start watching this issue




                Issue deployment