Issue Details (XML | Word | Printable)

Key: HADOOP-2906
Type: New Feature New Feature
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Runping Qi
Reporter: Runping Qi
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

output format classes that can write to different files depending on keys and/or config variable

Created: 26/Feb/08 11:28 PM   Updated: 08/Jul/09 04:52 PM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: 0.17.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works patch.2096.6.txt 2008-03-05 03:53 PM Runping Qi 19 kB

Resolution Date: 06/Mar/08 02:59 AM


 Description  « Hide
I've a few apps that require to write out data into different files/directories depending on keys and/or configuration variables.
I've implemented such classes for those apps. I noticed that many other users have similar need from time to time.
So I think it may be a good idea to contribute to Hadoop mapred.lib package so that other users can benefit from it.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Runping Qi added a comment - 28/Feb/08 06:43 AM

The attached patch include a common abstract base class (MultipleOutputFormat) and two concrete classes:
MultipleTextOutputFormat and MultipleSequenceFileOutputFormat. These classes implement the default behaviors,
which are the same as TextOutputFormat class and SequenceFileOutputFormat class, respectively.
The users can subclass these classes and overwrite one of the protected method to implement a specific logic
of writing data to different output files.
The patch also contains a test case, which also illustrates two special ways of using these classes.


Hadoop QA added a comment - 28/Feb/08 08:12 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12376696/patch.2096.txt
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 3 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac -1. The applied patch generated 620 javac compiler warnings (more than the trunk's current 619 warnings).

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1859/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1859/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1859/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1859/console

This message is automatically generated.


Runping Qi added a comment - 28/Feb/08 02:47 PM

I think the extra javac warning is due to the
@SuppressWarnings("unchecked") directive in the following code

@SuppressWarnings("unchecked")
      public void write(WritableComparable key, Writable value) throws IOException {

        // get the file name based on the key
        String keyBasedPath = generateFileNameForKey(key, myName);

        // get the file name based on the input file name
        String finalPath = getInputFileBasedOutputFileName(myJob, keyBasedPath);

        // get the actual key
        WritableComparable actualKey = generateActualKey(key);

        RecordWriter rw = this.recordWriters.get(finalPath);
        if (rw == null) {
          // if we don't have the record writer yet for the final path, create one
          // and add it to the cache
          rw = getRecordWriter_inner(myFS, myJob, finalPath, myProgressable);
          this.recordWriters.put(finalPath, rw);
        }
        rw.write(actualKey, value);
      };

Since javac warns about
rw.write(actualKey, value)
The reason for that is rw is RecordWriter type, not the parameterized one.
The reason for that is that rw may be a record writer generated by SequenceFileOutputFormat
which does not generate object of parameterized RecordWriter. Tried a few ways to get rid of the warning, but all failed.


Runping Qi added a comment - 28/Feb/08 05:29 PM

Finally managed to get rid of the javac warning


Hadoop QA added a comment - 28/Feb/08 06:48 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12376745/patch.2096.1.txt
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 3 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac -1. The applied patch generated 616 javac compiler warnings (more than the trunk's current 615 warnings).

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests -1. The patch failed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1862/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1862/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1862/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1862/console

This message is automatically generated.


Runping Qi added a comment - 28/Feb/08 11:08 PM

There was a javac warning in the test class.
The new patch fixes it.


Runping Qi added a comment - 29/Feb/08 01:52 AM
Incorporate some feedback comments

Hadoop QA added a comment - 29/Feb/08 02:47 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12376775/patch.2096.2.txt
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 3 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests -1. The patch failed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1869/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1869/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1869/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1869/console

This message is automatically generated.


Hadoop QA added a comment - 29/Feb/08 05:14 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12376785/patch.2096.3.txt
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 3 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests -1. The patch failed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1872/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1872/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1872/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1872/console

This message is automatically generated.


Runping Qi added a comment - 04/Mar/08 07:25 PM

Previously attached patch was wrong.
Attach the correct version now.


Chris Douglas added a comment - 04/Mar/08 09:37 PM
A couple suggestions:
  • If "num.of.trailing.legs.to.use" exceeds the number of segments in the input file path string, then this will throw an IllegalArgumentException from Path. A more helpful message should probably accompany this condition.
  • It might be worth calling out in the javadocs that generateActualKey and generateActualValue should be aware of side-effects, since write typically doesn't modify its args and the framework will reuse them. The code is clear enough that users can educate themselves, but this is deserving of a footnote.

Otherwise, +1


Martin Traverso added a comment - 04/Mar/08 10:05 PM
I would suggest changing the name of the property from "num.of.trailing.legs.to.use" to something that reflects the hierarchy in which the property lives. Maybe something like mapred.output.format.multi.trailingLegs or similar.

Runping Qi added a comment - 05/Mar/08 03:53 PM

replaced the attribute name "num.of.trailing.legs.to/use' with "mapred.outputformat.numOfTrailingLegs"

address the case where the number specified by the above variable is larger than the number of legs
in the input file.


Hadoop QA added a comment - 05/Mar/08 09:49 PM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12377175/patch.2096.6.txt
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 3 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1897/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1897/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1897/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1897/console

This message is automatically generated.


Chris Douglas added a comment - 06/Mar/08 02:59 AM
I just committed this. Thanks, Runping!

Hudson added a comment - 06/Mar/08 12:27 PM