
|
If you were logged in you would be able to see more operations.
|
|
|
|
File Attachments:
|
|
|
Issue Links:
|
Dependants
|
|
This issue depends on:
|
|
HADOOP-1251
A method to get the InputSplit from a Mapper
|
|
|
|
|
HADOOP-1250
Remove the MustangFile class from streaming and promote the chmod into FileUtils
|
|
|
|
|
|
|
|
| Resolution Date: |
16/May/07 07:23 PM
|
MapReduce C++ support
Requirements
1. Allow users to write Map, Reduce, RecordReader, and RecordWriter functions in C++, rest of the infrastructure already present in Java should be reused.
2. Avoid users having to write both Java and C++ for this to work.
3. Avoid users having to work with JNI methods directly by wrapping them in helper functions.
4. The interface should be SWIG'able.
|
|
Description
|
MapReduce C++ support
Requirements
1. Allow users to write Map, Reduce, RecordReader, and RecordWriter functions in C++, rest of the infrastructure already present in Java should be reused.
2. Avoid users having to write both Java and C++ for this to work.
3. Avoid users having to work with JNI methods directly by wrapping them in helper functions.
4. The interface should be SWIG'able. |
Show » |
made changes - 26/May/06 07:17 AM
| Field |
Original Value |
New Value |
|
Attachment
|
|
Hadoop MaReduce Developer doc.pdf
[ 12334583
]
|
made changes - 30/May/06 06:58 PM
made changes - 31/May/06 07:25 AM
|
Fix Version/s
|
|
0.4
[ 12311021
]
|
made changes - 06/Jun/06 06:16 AM
|
Workflow
|
jira
[ 12372161
]
|
no reopen closed
[ 12372946
]
|
made changes - 07/Jun/06 04:38 AM
|
Workflow
|
no reopen closed
[ 12372946
]
|
no-reopen-closed
[ 12373278
]
|
made changes - 29/Jun/06 04:11 AM
|
Fix Version/s
|
0.4.0
[ 12311021
]
|
|
|
Fix Version/s
|
|
0.5.0
[ 12311939
]
|
made changes - 03/Aug/06 05:46 PM
|
Workflow
|
no-reopen-closed
[ 12373278
]
|
no-reopen-closed, patch-avail
[ 12377476
]
|
made changes - 04/Aug/06 08:05 PM
|
Fix Version/s
|
0.5.0
[ 12311939
]
|
|
|
Fix Version/s
|
|
0.6.0
[ 12312025
]
|
made changes - 08/Sep/06 08:23 PM
|
Fix Version/s
|
0.6.0
[ 12312025
]
|
|
made changes - 15/Dec/06 09:40 PM
|
Assignee
|
|
Owen O'Malley
[ owen.omalley
]
|
made changes - 18/Feb/07 08:27 AM
|
Fix Version/s
|
|
0.12.0
[ 12312293
]
|
|
Description
|
MapReduce C++ support
Requirements
1. Allow users to write Map and Reduce functions in C++, rest of the
infrastructure already present in Java should be reused.
2. Avoid users having to write both Java and C++ for this to work.
3. Avoid users having to work with JNI methods directly by wrapping them in helper functions.
4. Use Record IO for describing record format, both MR java framework and C++ should
use the same format to work seemlessly.
5. Allow users to write simple map reduce tasks without learning record IO if keys and values are
simple strings.
Implementation notes
- If keys and values are simple strings then user passes SimpleNativeMapper in JobConf and implements
mapper and reducer methods in C++.
- For composite Record IO types user starts with defining a record format using Record IO DDL.
- User generates Java and C++ classes from the DDL using record IO.
- Users configures JobConf to use the generated Java classes as the MR input/output, key/value classes.
- User writes Map and Reduce functions in C++ using a standard interface ( given below ) , this interface
makes a serialized record IO format available to the C++ function which should be deserialized in corrosponding
generated C++ record IO classes.
- User uses the helper functions to pass the serialized format of generated output key/value pairs to output collector.
Following is a pseudocode for the Mapper ( Reducer can be implemented similarly ) -
Native(JNI) Java proxy for the Mapper :
---------------------------------------
Without Record IO :-
--------------------
public class SimpleNativeMapper extends MapReduceBase implements Mapper {
/**
* Works on simple strings.
**/
public void map(WritableComparable key, Writable value,
OutputCollector output, Reporter reporter) throws IOException {
mapNative(key.toString().getBytes()
, value.toString().getBytes(), output, reporter);
}
/**
* Native implementation.
**/
private native void mapNative(byte[] key, byte[] value,
OutputCollector output, Reporter reporter) throws IOException;
}
With Record IO :-
------------------
public class RecordIONativeMapper extends MapReduceBase implements Mapper {
/**
* Implementation of map method, this acts as a JNI proxy for actual map
* method implemented in C++. Works for Record IO based records.
* @see map(byte[] , byte[], OutputCollector, Reporter)
*/
public void map(WritableComparable key, Writable value,
OutputCollector output, Reporter reporter) throws IOException {
byte[] keyBytes = null ;
byte[] valueBytes = null ;
try{
// we need to serialize the key and record and pass the serialized
// format to C++ / JNI methods so they can interpret it using appropriate
// record IO classes.
{
ByteArrayOutputStream keyStream = new ByteArrayOutputStream() ;
BinaryOutputArchive boa = new BinaryOutputArchive(new DataOutputStream(keyStream)) ;
((Record)key).serialize(boa, "WhatIsTag");
keyBytes = keyStream.toByteArray();
}
{
ByteArrayOutputStream valueStream = new ByteArrayOutputStream() ;
BinaryOutputArchive boa = new BinaryOutputArchive(new DataOutputStream(valueStream)) ;
((Record)key).serialize(boa, "WhatIsTag");
valueBytes = valueStream.toByteArray();
}
}catch(ClassCastException e){
// throw better exceptions
throw new IOException("Input record must be of Record IO Type");
}
// pass the serialized byte[] to C++ implementation.
mapNative(keyBytes, valueBytes, output, reporter);
}
/**
* Implementation in C++.
*/
private native void mapNative(byte[] key, byte[] value,
OutputCollector output, Reporter reporter) throws IOException;
}
OutputCollector Proxy for C++
------------------------------
public class NativeOutputCollector implements OutputCollector {
// standard method from interface
public void collect(WritableComparable key, Writable value)
throws IOException {
}
// deserializes key and value and calls collect(WritableComparable, Writable)
public void collectFromNative(byte[]key, byte[]value){
// deserialize key and value to java types ( as configured in JobConf )
// call actual collect method
}
}
Core Native functions ( helper for user provided Mapper and Reducer )
---------------------------------------------------------------------
#include "org_apache_hadoop_mapred_NativeMapper.h"
#include "UserMapper.h"
/**
* A C++ proxy method, calls actual implementation of the Mapper. This method
signature is generated by javah.
**/
JNIEXPORT void JNICALL Java_org_apache_hadoop_mapred_NativeMapper_mapNative
(JNIEnv *env, jobject thisObj, jbyteArray key, jbyteArray value,
jobject output_collector, jobject reporter);
{
// convert char* and pass on to user defined map method.
// user's map method should take care of converting it to correct record IO
// type.
int keyLen = (*env)->GetArrayLength(env, key) ;
int valueLen = (*env)->GetArrayLength(env, valueLen) ;
const char *keyBuf = (*env)->GetByteArrayElements(env,key, keyLen, JNI_FALSE) ;
const char *valueuf = (*env)->GetByteArrayElements(env,value, valueLen, JNI_FALSE) ;
// Call User defined method
user_map(keyBuf, valueBuf, output_collector, reporter) ;
(*env)->ReleaseByteArrayElements(env, key, keyBuf, JNI_ABORT) ;
(*env)->ReleaseByteArrayElements(env, value, ValueBuf, JNI_ABORT) ;
}
/**
Helper method, acts as a proxy to OutputCollector in java. key and value
must be serialized forms of records as specified in JobConf.
**/
void output_collector(const char * key, const char *value,
jobject output_collector, jobject reporter){
// invoke java NativeOutputCollector.collect with key and value.
}
User defined Mapper ( and Reducer )
------------------------------------
/**
implements user defined map operation.
**/
void user_mapper(const char *key, const char *value, jobject collector, jobject recorder) {
//1. deserialize key/value in the appropriate format using record IO.
//2. process key/value and generate the intermediate key/values in record IO format.
//3. Deserialize intermediate key/values to intermed_key and intermed_value
//4. pass intermed_key/intermed_value using helper function -
// output_collector(intermed_key, intermed_value, collector, recorder);
}
|
MapReduce C++ support
Requirements
1. Allow users to write Map, Reduce, RecordReader, and RecordWriter functions in C++, rest of the infrastructure already present in Java should be reused.
2. Avoid users having to write both Java and C++ for this to work.
3. Avoid users having to work with JNI methods directly by wrapping them in helper functions.
4. The interface should be SWIG'able.
|
made changes - 18/Feb/07 08:30 AM
|
Attachment
|
Hadoop MaReduce Developer doc.pdf
[ 12334583
]
|
|
made changes - 18/Feb/07 08:31 AM
|
Status
|
Open
[ 1
]
|
In Progress
[ 3
]
|
made changes - 02/Mar/07 10:17 PM
|
Fix Version/s
|
0.12.0
[ 12312293
]
|
|
made changes - 18/Apr/07 10:30 PM
|
Attachment
|
|
pipes.patch
[ 12355795
]
|
made changes - 20/Apr/07 08:50 PM
|
Summary
|
Support for writing Map/Reduce functions in C++
|
Hadoop Pipes for writing map/reduce jobs in C++ and python
|
made changes - 16/May/07 12:17 AM
|
Component/s
|
|
pipes
[ 12311773
]
|
|
Component/s
|
mapred
[ 12310690
]
|
|
|
Fix Version/s
|
|
0.14.0
[ 12312474
]
|
made changes - 16/May/07 12:21 AM
|
Attachment
|
|
pipes-2.patch
[ 12357433
]
|
made changes - 16/May/07 12:22 AM
|
Status
|
In Progress
[ 3
]
|
Patch Available
[ 10002
]
|
made changes - 16/May/07 07:23 PM
|
Status
|
Patch Available
[ 10002
]
|
Resolved
[ 5
]
|
|
Resolution
|
|
Fixed
[ 1
]
|
made changes - 20/Aug/07 06:11 PM
|
Status
|
Resolved
[ 5
]
|
Closed
[ 6
]
|
made changes - 08/Jul/09 04:41 PM
|
Component/s
|
pipes
[ 12311773
]
|
|
|