Issue Details (XML | Word | Printable)

Key: HADOOP-3665
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Chris Douglas
Reporter: Lukas Vlcek
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

WritableComparator newKey() fails for NullWritable

Created: 29/Jun/08 09:41 PM   Updated: 22/Aug/08 07:50 PM
Component/s: io
Affects Version/s: 0.16.0
Fix Version/s: 0.18.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 3665-0.patch 2008-07-01 07:04 AM Chris Douglas 10 kB
Text File Licensed for inclusion in ASF works 3665-1.patch 2008-07-01 09:24 AM Chris Douglas 10 kB
Text File Licensed for inclusion in ASF works 3665-2.patch 2008-07-02 08:21 PM Chris Douglas 9 kB
File Licensed for inclusion in ASF works HADOOP-3665.path 2008-06-29 10:13 PM Lukas Vlcek 7 kB
Environment: n/a
Issue Links:
Reference
 

Hadoop Flags: Reviewed, Incompatible change
Resolution Date: 03/Jul/08 11:47 PM


 Description  « Hide
It is not possible to use NullWritable as a key in order to suppress key value in output.

Syndrome exception:
Caused by: java.lang.IllegalAccessException: Class org.apache.hadoop.io.WritableComparator can not access a member of class org.apache.hadoop.io.NullWritable with modifiers "private"

The problem is that NullWritable is a singleton and does not provide public non-parametric constructor. The following code in WritableComparator causes the exception: return (WritableComparable)keyClass.newInstance();

Proposed simple solution is to use ReflectionUtils instead (it requires modification as well).

This issue is probably related to HADOOP-2922



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Lukas Vlcek added a comment - 29/Jun/08 10:13 PM
Here is simple patch. Please review it!

Still there are few questions/notes opened I think:
1) There is no clear way how to learn whether particular WritableComparator can be instantiated or not. There should be some new interface introduced for singletons.
2) It is not clear if key value can be of zero length. For NullWritable it make sense but it has nothing to do with the fact that it is a singleton I think.
3) I found other two classes implementing WritableComparable having private or protected constructor. These are ID and TaskID. Since they are both WritableComparable they may be used as a key (I can not tell if there is any point in such use case but technically it should be possible). This patch DOES NOT fixes such use case in any way.

Giving these notes together I can see the following conclusion:
Going forward it should be possible to learn if WritableComparable:
a) can be used as a key
b) can have zero length
c) can be instantiated using constructor and if not then what is the way to get the instance.


Chris Douglas added a comment - 30/Jun/08 09:59 PM
  • NullWritable should define a RawComparator, so it doesn't need to use the default WritableComparator rather than special-casing NullWritable
  • Your apt use of ReflectionUtils calls out a separate issue with WritableComparator, i.e. that the keys and values it uses are not Configured, where in most contexts they are. This should probably be a separate JIRA...
  • SequenceFile shouldn't make a special case of NullWritable, either. Does the rest of the logic still work if the expression is changed from keylength == 0 to keylength < 0?
  • ReflectionUtils shouldn't special-case NullWritable, either. Are you certain this is necessary? Doesn't setAccessible take care of this?

Lukas Vlcek added a comment - 30/Jun/08 11:43 PM
The whole point is that I would like to understand how Reduce job can output a file without any key values in it. The NullWritable seemed to be an ideal candidate for this but unfortunately I ran into exceptions when trying it. So I made a quick and dirty fix which is not meant to be a production ready (obviously NullWritable should not be special-cased in any way!).

On the other hand there seemed to be some questions which need to be asked and possible addressed. One of them is that ReflectionUtils is able to call any constructor after setAccessible is set to true but is this what we really want for singleton keys? And do we really need singleton keys at all? (I believe the answer is positive). How about size (length) of key value? Is it allowed to be zero? And why WritableComparato calls to newInstance method while this causes issues with any class having non-public constructor?

These were the questions I was trying to acknowledge while learning Hadoop and trying to experiment with some simple jobs having no Reduce output key...


Chris Douglas added a comment - 01/Jul/08 07:04 AM

The whole point is that I would like to understand how Reduce job can output a file without any key values in it. The NullWritable seemed to be an ideal candidate for this but unfortunately I ran into exceptions when trying it. So I made a quick and dirty fix which is not meant to be a production ready (obviously NullWritable should not be special-cased in any way!).

I'm sorry, I hadn't understood this. If you only want to output null keys from your reduce, then the RecordWriter used by your OutputFormat can encode or ignore null keys (e.g. TextOutputFormat). SequenceFiles, as you discovered, explicitly disallow zero-length keys, so you'll need to pick a different binary file format to store output records. Glancing at the code, this constraint is inconsistently enforced, and not for any particular reason that I can discern. Adapting SequenceFile to handle zero-length keys might be as simple as allowing zero-length keys from the Writers, since the Reader looks like it could handle it.

On the other hand there seemed to be some questions which need to be asked and possible addressed. One of them is that ReflectionUtils is able to call any constructor after setAccessible is set to true but is this what we really want for singleton keys? And do we really need singleton keys at all? (I believe the answer is positive).

There's already a fair amount of object reuse. We need an object to deserialize into per the Writable contract, so a registration system like the one in WritableComparator would be necessary in ReflectionUtils to make singletons work (i.e. a map of classes to instances checked before the map of classes to constructors). Other than NullWritable, all of the sane use cases I can think of are just badly designed, but there are likely good ones.

How about size (length) of key value? Is it allowed to be zero?

It depends on where in the framework you're looking. The OutputFormat defines how to encode/handle null/NullWritable keys from the reduce (or the map if you're running without reduces). In 0.17, intermediate data is stored in SequenceFiles, so zero-length keys can't be emitted from the map. In 0.18, zero-length keys are supported, but their semantics are kind of odd. In most cases, emitting NullWritable keys from the map is not a scalable design.

And why WritableComparato calls to newInstance method while this causes issues with any class having non-public constructor?

Most WritableComparable types use RawComparator, which provides much better performance while rendering this consideration irrelevant. Unfortunately, WritableComparator creates new instances of its internal keys whether it requires them or not! This is easily remedied. This patch does the following:

  • No longer creates instances of the WritableComparable in WritableComparator when a class has registered a WritableComparator (neither does it create a buffer). This makes super.compare(byte[], off1, len1, byte[], off2, len2) illegal, but I doubt this is a problem. Though one could imagine a situation where a raw comparator attempts an efficient comparison but uses the slow comparator when the result is ambiguous, such a comparator is easily adapted.
  • Lets WritableComparators be configurable, so WritableComparable objects not defining RawComparators are still configured before being compared
  • Defines a raw comparator for NullWritable
  • Changes checks in SequenceFile Writer classes to check only for key lengths less than zero; this doesn't require any changes to the Reader, which already supports zero-length keys, so the SequenceFile version doesn't need to be adjusted, either.
  • Adds a test case for reading/writing NullWritable keys.

Hadoop QA added a comment - 01/Jul/08 08:49 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12385009/3665-0.patch
against trunk revision 672976.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

-1 core tests. The patch failed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2775/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2775/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2775/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2775/console

This message is automatically generated.


Chris Douglas added a comment - 01/Jul/08 09:20 AM
Corrected WritableComparator creation in SequenceFile::Sorter

Hadoop QA added a comment - 01/Jul/08 10:56 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12385023/3665-1.patch
against trunk revision 672976.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2776/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2776/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2776/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2776/console

This message is automatically generated.


Arun C Murthy added a comment - 02/Jul/08 12:24 AM - edited
I'm ok with this patch, though making WritableComparator a Configurable might be a stretch... given that we don't have and Configurable Writables. But I'm happy to bow down to the wisdom of the crowd.

Chris Douglas added a comment - 02/Jul/08 08:21 PM
I think Arun is right; the approach in 3665-1 wouldn't have accomplished what it set out to do, anyway. Attached a patch removing the Configurable stuff.

Lukas Vlcek added a comment - 02/Jul/08 09:46 PM
I tested 3665-2 with my specific use case (NullWritable in reduced key) and it is working fine! Thanks guys.

Lukas Vlcek added a comment - 02/Jul/08 09:49 PM
BTW: Shouldn't this issue be renamed to something more descriptive?

Hadoop QA added a comment - 02/Jul/08 10:13 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12385142/3665-2.patch
against trunk revision 673445.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2784/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2784/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2784/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2784/console

This message is automatically generated.


Arun C Murthy added a comment - 03/Jul/08 08:59 PM
Looks good. I think we should just go ahead and mark this as an incompatible change for the fact that the key1/key2 fields aren't constructed for the protected WritableComparator constructor, a break from the past.

Chris Douglas added a comment - 03/Jul/08 11:47 PM
I just committed this.

Hudson added a comment - 22/Aug/08 12:34 PM