|
Runping,
The MultipleOutpuFormat of The MO* API allows to easily define named-outputs with its own configuration (output-format, key class, value class) and use them on the usual collector way. The MO code takes care of namespacing the files names, creating and closing them. Alejandro, your patch is more complicated and less general than the already committed patch. I would suggest that you write a subclass of
Looking over your and Runping's patches, I'd suggest defining a subclass that looks like:
package org.apache.hadoop.mapred.lib; public class KeyValue<K,V> { private K key; private V value; public KeyValue(); public KeyValue(K key, V value); public K getKey() ; public V getValue(); public void setKey(K k); public void setValue(V v); } public class MultipleOutputStreams extends MultipleOutputFormat { // modifiy job conf to control how format a given stream // should be called once for each stream kind public static void addOutputStream(JobConf conf, String kind, Class<? extends OutputFormat> outFormat, Class<?> keyClass, Class<?> valueClass); } So client code would look like: In launcher: MultipleOutputStreams.addOutputStream(job, "foo", SequenceFileOutputFormat.class, Text.class, IntegerWritable.class); MultipleOutputStreams.addOutputStream(job, "bar", TextOutputFormat.class, Text.class, Text.class); In reducer: out.collect("foo", new KeyValue(new Text("hi"), new IntegerWritable(12)); out.collect("bar", new KeyValue(k2, v2)); Owen, let me try filling the gaps in your suggestion.
Follow up comments/questions:
Thoughts? The key to use MultipleOutputFormat class is to define a subclass that implements abstract protected RecordWriter<K, V> getBaseRecordWriter(FileSystem fs, JobConf job, String name, Progressable arg3) throws IOException; MultipleSequenceFileOutputFormat and MultipleTextOutputFormat are two simple but commonly used sub classes. Got it.
Implemented the getBaseRecordWriter doing a trick to work with arbitrary OutputFormats. I had to modify the getBaseRecordWriter signature to pass the leafName as parameter, to be able to obtain the corresponding OutputFormat/Key/Value classes. It took me a while to understand how recordWriters are cached within a single RecordWriter, IMO kind of too twisted. The current MultipleSequenceFileOutputFormat has a limitation, it only works with the same key and value classes as the standard job output. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12379107/patch3149.txt against trunk revision 643282. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 3 new or modified tests. javadoc -1. The javadoc tool appears to have generated 1 warning messages. javac -1. The applied patch generated 524 javac compiler warnings (more than the trunk's current 521 warnings). release audit +1. The applied patch does not generate any new release audit warnings. findbugs -1. The patch appears to introduce 2 new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2121/testReport/ This message is automatically generated. corrections to indentation/style problems in new files.
ignoring indentation problems in modified files in lines the patch does not touch. on the findbug report on MultipleOutputTask Regarding "IS2_INCONSISTENT_SYNC: Inconsistent synchronization" the non-synched access is in configure, the synched access is during writes. The var in question is conf which is used in read-only mode. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12379128/patch3149.txt against trunk revision 643282. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 3 new or modified tests. javadoc -1. The javadoc tool appears to have generated 1 warning messages. javac -1. The applied patch generated 524 javac compiler warnings (more than the trunk's current 521 warnings). release audit +1. The applied patch does not generate any new release audit warnings. findbugs -1. The patch appears to introduce 1 new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2124/testReport/ This message is automatically generated. You've got to fix the findbugs warning and javadoc warnings.
You also missed the point of the KeyValue class. The point was to make: job.setOutputKeyClass(Text.class); job.setOutputValueClass(KeyValue.class); So that you can emit the same type out of the map/reduce. The multiple output format then generates the real key and value by pulling them out of the KeyValue object. I don't see why you feel the need to limit the names to [a-zA-Z0-9]. Surely, it doesn't matter if they use underscore or period... There should not be multiple output task or multiple output collector classes... findbugs, fixed one, the second one IS is a false positive (explained that in previous comment)
fixing checksytle warnings in next version of patch.
Now I understand what you have in mind with the KeyValue object and not having MultipleOutput* classes. I have the following comments on that:
I would rather refactor code to:
Add a collect(String namedOutput, WritableComparable key, Writable value) method to the MultipleOutputs and the usage pattern would be public class MyReducer implements Reducer {
private MultipleOutputs mos;
public void configure(JobConf conf) {
mos = new MultipleOutputs(conf);
}
public void reduce(WritableComparable key, Iterator<Writable> values, OutputCollector collector, Reporter reporter) throws IOException {
Writable value = values.next();
collector.collect(key,value);
mos.collect("aa", key, value);
}
public void close() throws IOException {
mos.close();
}
}
configuration of the job prior to dispatching would remain the same. Modified patch as per my last comments.
Alejandro, Having a private mos in mapper/reducer class, in additional the standard collector passed in through map/reduce call is really urgly and is redudant, and is not compatible with the common patterns of applications using map/reduce framework. I found it is most common to that all the output key values are of the same types. Runping,
Yes, I've got it. Please look at my last patch. If the consensus is that MultipleOutputs should not be in Hadoop, then I would request o have the proposed signature change to the getBaseRecordWriter method to have the original leafName. Thxs. Alejandro, If you look at the code carefully, you will notice that the the value passed to the name argument of getBaseRecordWriter method go through the final String myName = generateLeafFileName(name); String keyBasedPath = generateFileNameForKeyValue(key, value, myName); // get the file name based on the input file name String finalPath = getInputFileBasedOutputFileName(myJob, keyBasedPath); The methods, generateLeafFileName, generateFileNameForKeyValue and getInputFileBasedOutputFileName, are Hope this helps. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12379227/patch3149.txt against trunk revision 643282. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 3 new or modified tests. javadoc -1. The javadoc tool appears to have generated 1 warning messages. javac -1. The applied patch generated 511 javac compiler warnings (more than the trunk's current 510 warnings). release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2139/testReport/ This message is automatically generated. Runping, I'm confused why there are three steps to generating the name. It seems really confusing. Why not have:
String finalPath = generateLeafFileName(name, key, job);
possibly with the value, if it really makes sense. Of course, that would be a separate bug... I think this patch is still too big & complex for what the goal is. (A simplier and friendlier interface to the multiple output format.)
I don't understand your objection to the KeyValue class. It is designed precisely so that you can emit different key and value types to each of the base output formats from your reducer (or mapper, if there is no reduce). Using a wrapper class, makes the polymorphism easy and is cheap for performance. I do agree that the getBaseRecordWriter should get the virtual key (aka leaf name). bq: possibly with the value, if it really makes sense. Of course, that would be a separate bug...
yes. it makes sense to combine the first two steps. please open anew jira and we can address there. My concern with the KeyValue class and how it would be used is that is not and intuitive use of the outputcollector. The expectation is that the collect(key,value) method takes a key and a value, with the use of the KeyValue then the key is the named output and the value is the key/value to output. IMO this is awkward.
While the MultipleOutputFormat provides a powerful mechanism to use the key/value to redirect the output in very clever ways, still I think there should be a simple API to just write key/values to different outputs. And, as you suggested, based on the MultipleOutputFormat. Regarding the finalPath generation and the 2 steps, agree that is confusing how things are done. Still I think the 2 steps are needed, one to generate the filename without partition info (to avoid name collision) and the second to add teh partition info (to avoid name collision). Probably the current method names are not correct and thus the confussion it generates. Also, in the getRecordWriter() in the anonymous RecordWriter that is being returned, the TreeMap used to store the different record writers, it is being keyed off using the finalPath of the file, shouldn't be using the keyBasedPath? This feature missed 0.17 feature freeze.
Removed dependency on the MultipleOutputFormat.
Code became simpler as MultipleOutputs addresses specifically the case of defining a fixed set of additional outputs during the job configuration. The patch, as before, allows the different outputs to have different OutputFormat and Key/Value classes from the job output configuration and among them. Included in the Javadoc sample usage. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12380141/patch3149.txt against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 3 new or modified tests. javadoc -1. The javadoc tool appears to have generated 1 warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2235/testReport/ This message is automatically generated. fixing javadoc warning
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12380157/patch3149.txt against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 3 new or modified tests. javadoc -1. The javadoc tool appears to have generated 1 warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2238/testReport/ This message is automatically generated. This patch adds a property with the name of the current namedOuput to the JobConf used to create the RecordReader.
Using this property custom OutputFormats can obtain the name of the namedOutput being created and do any nameoutput specific configuration. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12380363/patch3149.txt against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 3 new or modified tests. javadoc -1. The javadoc tool appears to have generated 1 warning messages. javac -1. The applied patch generated 483 javac compiler warnings (more than the trunk's current 482 warnings). release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2260/testReport/ This message is automatically generated. This is getting close, but you still have 8 java doc warnings:
[javadoc] Building tree for all the packages and classes... [javadoc] /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/java/org/apache/hadoop/mapred/lib/MultipleOutputs.java:99: warning - Tag @link: missing '#': "addNamedOutput()" [javadoc] /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/java/org/apache/hadoop/mapred/lib/MultipleOutputs.java:99: warning - Tag @link: can't find addNamedOutput() in org.apache.hadoop.mapred.lib.MultipleOutputs [javadoc] /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/java/org/apache/hadoop/mapred/lib/MultipleOutputs.java:99: warning - Tag @link: reference not found: MultipleOutputMapper [javadoc] /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/java/org/apache/hadoop/mapred/lib/MultipleOutputs.java:99: warning - Tag @link: reference not found: MultipleOutputReducer [javadoc] /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/java/org/apache/hadoop/mapred/lib/MultipleOutputs.java:99: warning - Tag @link: reference not found: MultipleOutputMapper [javadoc] /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/java/org/apache/hadoop/mapred/lib/MultipleOutputs.java:99: warning - Tag @link: reference not found: MultipleOutputReducer [javadoc] Building index for all the packages and classes... [javadoc] /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/java/org/apache/hadoop/mapred/lib/MultipleOutputs.java:99: warning - Tag @link: reference not found: MultipleOutputMapper [javadoc] /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/java/org/apache/hadoop/mapred/lib/MultipleOutputs.java:99: warning - Tag @link: reference not found: MultipleOutputReducer that you need to fix. You should also remove the references to Writable and WritableComparable and make them Object instead. I'm still not wild about having 2 completely separate interfaces for multiple outputs in the library, but this one is easier to use, if less flexible... fixing javadocs.
>> You should also remove the references to Writable and WritableComparable and make them Object instead. Where? Do you mean in the MultipleOutputs.collect() method or in the static methods for configuration? Fixing javadocs.
Regarding the removal of Writable and WritableComparable replacing them with Object: If I understand things correctly, all API used directly from mapper/reducer code should be typed or generified to enforce as much type safety as possible at compile time. Internal API are more lax. If my assumption is correct then this API should use the Writable/WritableComparable and generics. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12380891/patch3149.txt against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 3 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac -1. The applied patch generated 483 javac compiler warnings (more than the trunk's current 482 warnings). release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2323/testReport/ This message is automatically generated. taking care of javac warning (in TestCase)
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12381415/patch3149.txt against trunk revision 653264. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2395/testReport/ This message is automatically generated. Regarding Owen's You should also remove the references to Writable and WritableComparable and make them Object instead. comment.
Assuming that it refers to generified Objects, I still think it is not applicable to remove the use of WritableComparable and Writable from the MultipleOutputs.collect() . The K and V classes uses in the different outputs of a MultipleOutputs may differ among the outputs and they may differ from the K and V classes used by the M/R. Because of this they can not be bound to the K and V generics of the Mapper and Reducer classes, thus they have to accept the base class. For example: JobConf conf = new JobConf();
conf.setMapperClass(MOMap.class);
conf.setReducerClass(MOReduce.class);
conf.setInputPath(inDir);
conf.setInputFormat(TextInputFormat.class);
conf.setMapOutputKeyClass(LongWritable.class);
conf.setMapOutputValueClass(Text.class);
FileOutputFormat.setOutputPath(conf, outDir);
conf.setOutputFormat(TextOutputFormat.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
MultipleOutputs.addNamedOutput(conf, "text", TextOutputFormat.class, LongWritable.class, Text.class);
MultipleOutputs.addNamedOutput(conf, "sequence", Text.class, MapWritable.class, Text.class);
...
public class MOReduce implements Reducer<LongWritable, Text, Text, Text> {
private MultipleOutputs mos;
public void configure(JobConf conf) {
mos = new MultipleOutputs(conf);
}
public void reduce(LongWritable key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
output.collect(new Text("a"), value);
mos.collect("text", key, new Text("Hello"), reporter);
mos.collect("sequence", value, new MapWritable(), reporter);
}
public void close() throws IOException {
mos.close();
}
}
How about the following API change in the MultipleOutputs?
Instead the void collect(String, WritableComparable, Writable) method have a OutputCollector getCollector(String namedOutput, Reporter reporter) . This would bring consistency on the API use to write K/V out and it would allow the reuse of code that accepts OutputCollectors to write K/V to named outputs. For example: public void base64AndWrite(WritableComparable key, Writable value,OutputCollector<Text,Text> collector); Then this method could be used with the M/R OutputCollectors and reused with for MultipleOutputs No, if you look at the current (0.17 via
public interface OutputCollector<K,V> { void collect(K key, V value) throws IOException; } instead of: public interface OutputCollector<K extends WritableComparable, V extends Writable> { void collect(K key, V value) throws IOException; } So your interface should be compatible and take Objects instead of Writables... Finally understood (I was not aware of Hadoop-1986).
Removing used of WritableComparable and Writable Only remaining question would be the refactoring to return an OutputCollector instead having the collect() method (see my previous comment on that). +1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12381855/patch3149.txt against trunk revision 655410. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2450/testReport/ This message is automatically generated. Alejandro,
Sorry it took so long to get back to this one! I'm still troubled about having two classes in mapred.lib that do almost exactly the same thing. What would be the changes to Runping's interface that would make you happy with it? On a more pragmatic note, you don't need the InternalFileInputFormat. Just put it into the MultipleOutputs. It also bothered me to see you getting the default file system, but it is fine because you are just feeding it to the output format that shouldn't be using the file system parameter anyways. For our requirements the MultipleOutputFormat shortcomings (that are handled by the proposed patch) are:
From the M/R developer usage perspective:
IMO, the current MOF has great flexibility but that comes with a complexity cost in usage and understanding it. On your commento on not needing the InternalFileOutputFormat, you are right, I could make MultipleOutputs to extends FileOutputFormat and have the getRecordWriter() method there. On your comment on using the default file system, I did that because the OutputFormat interface indicates that is ignored and because the file to be created goes in the same place of the job output, thus using the job conf for it. incorporated 3258 in the patch as that issue was to address a need in this issue.
added support for multi named outputs. Being able to create multiple distinct outputs using the same outputformat, key class and value class in a dynamic way from the job. refactored to use the OutputCollector interface for writing to the named outputs. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12384642/patch3149.txt against trunk revision 671385. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. -1 javac. The applied patch generated 447 javac compiler warnings (more than the trunk's current 442 warnings). -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2730/testReport/ This message is automatically generated. fixing MultipleOutputs javadoc to avoid warning.
the warnings on the FileOutputFormat are on areas of the file not affected by this patch. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12384653/patch3149.txt against trunk revision 671385. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. -1 javac. The applied patch generated 447 javac compiler warnings (more than the trunk's current 442 warnings). -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2732/testReport/ This message is automatically generated. On the findbugs report, the use of a static variable is on purpose here. It is to allow the use of the same outputs from different components running in a Map or Reduce without having to pass the MultipleOutputs instance as parameter. It leverages the fact that tasks run in their own JVM.
The core-test error does not seem related. patch has a bug on multi outputs that are not created properly.
it also uses a static variable for the recorwriters, which is not correct (this was to handle a special case, but it may lead to errors). fixes file creation and handling for multi named outputs
makes map of recordwriters an instance var uses a set for the namedotuputs names, making the code cleaner by not overloading the use of the recordwriters map. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12384738/patch3149.txt against trunk revision 671563. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. -1 javac. The applied patch generated 449 javac compiler warnings (more than the trunk's current 442 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2746/testReport/ This message is automatically generated. This looks ready to go other than the javac and javadoc warnings. Please fix them. Sorry this has been such a long process on this feature!
Just a reflection or a forward looking thought, will it be also better to change the reduce contract to: public void reduce(Object key, Iterator; values, MultipleOutputs mos, Reporter reporter) throws IOException { mos.collect("default", new Text("a"), value, reporter); mos.collect("text", key, new Text("Hello"), reporter); mos.collect("sequence", value, new MapWritable(), reporter); } Hope the future contact based API will take care of this. fixing javadoc/javac warnings
MultipleOutputs getNamedOutputs() method is protected, it should be public.
making MultipleOutputs getNamedOutputs() method public
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12385371/patch3149.txt against trunk revision 674592. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 448 javac compiler warnings (more than the trunk's current 442 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2803/testReport/ This message is automatically generated. trying once more to see if it passes javac warning check, only warnings I see are in the testcases.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12385462/patch3149.txt against trunk revision 674932. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2817/testReport/ This message is automatically generated. I just committed this. Thanks Alejandro for being so patient on this one!
Created
Integrated in Hadoop-trunk #581 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I think the multiple output format class in https://issues.apache.org/jira/browse/HADOOP-2906
should fit your need well.
With that class, you can write different key/value pairs to different output files, all under your control.