Issue Details (XML | Word | Printable)

Key: HADOOP-3295
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Zheng Shao
Reporter: Zheng Shao
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Allow TextOutputFormat to use configurable separators

Created: 22/Apr/08 02:03 AM   Updated: 13/Dec/08 12:38 AM
Component/s: io
Affects Version/s: None
Fix Version/s: 0.18.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 3295-2.patch 2008-04-23 08:57 PM Zheng Shao 7 kB
Text File Licensed for inclusion in ASF works 3295.patch 2008-04-22 02:06 AM Zheng Shao 3 kB
Issue Links:
Reference
 

Hadoop Flags: Reviewed
Resolution Date: 25/Apr/08 08:03 PM


 Description  « Hide
TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Zheng Shao added a comment - 22/Apr/08 02:06 AM
This patch adds the configuration parameter.

Milind Bhandarkar added a comment - 22/Apr/08 03:14 AM
This is great !!!

I have been requesting this for a long time !!!!

Thanks Zheng !

Committers, please please please take a serious look at this !


Hadoop QA added a comment - 23/Apr/08 03:00 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12380659/3295.patch
against trunk revision 645773.

@author +1. The patch does not contain any @author tags.

tests included -1. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2299/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2299/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2299/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2299/console

This message is automatically generated.


Owen O'Malley added a comment - 23/Apr/08 06:39 AM
Zheng, please include a test for the new functionality.

Runping Qi added a comment - 23/Apr/08 12:26 PM

Note that you have made public api changes:

public LineRecordWriter(DataOutputStream out)

into

public LineRecordWriter(DataOutputStream out, String keyValueSeparator) {

It is a better to the keep the original one as an overloaded constructor:

public LineRecordWriter(DataOutputStream out) {
    LineRecordWriter(out, "\t");
}

Zheng Shao added a comment - 23/Apr/08 08:57 PM
Added a test for customized separator.

Added a constructor with the old prototype to make sure user code does not break because of the patch.


Hadoop QA added a comment - 23/Apr/08 11:45 PM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12380797/3295-2.patch
against trunk revision 645773.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 3 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2312/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2312/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2312/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2312/console

This message is automatically generated.


Chris Douglas added a comment - 25/Apr/08 08:03 PM
I just committed this. Thanks, Zheng

Hudson added a comment - 26/Apr/08 12:17 PM

Suhas Gogate added a comment - 10/Dec/08 01:00 AM
Feature added by this Jira has a problem while setting up some of the invalid xml characters e.g. ctrl-A e.g. mapred.textoutputformat.separator = "\u0001"

e,g,
String delim = "\u0001";
Conf.set("mapred.textoutputformat.separator", delim);

Job client serializes the jobconf with mapred.textoutputformat.separator set to "\u0001" (ctrl-A) and problem happens when it is de-serialized (read back) by job tracker, where it encounters invalid xml character.

The test for this feature public : testFormatWithCustomSeparator() does not serialize the jobconf after adding the separator as ctrl-A and hence does not detect the specific problem.

Here is an exception:

08/12/06 01:40:50 INFO mapred.FileInputFormat: Total input paths to process : 1
org.apache.hadoop.ipc.RemoteException: java.io.IOException:
java.lang.RuntimeException: org.xml.sax.SAXParseException: Character reference "&#1" is an invalid XML
character.
at
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:961)
at
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:864)
at
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:832)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:291)
at
org.apache.hadoop.mapred.JobConf.getJobPriority(JobConf.java:1163)
at
org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:179)
at
org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1783)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)

at org.apache.hadoop.ipc.Client.call(Client.java:715)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.mapred.$Proxy1.submitJob(Unknown Source)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
at


Zheng Shao added a comment - 10/Dec/08 01:53 AM
Can you open a separate jira and mark this one as related? Then we can discuss from there and produce a fix.