Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3516

Possible NPE in org.apache.hadoop.mapred.MapTask.MapOutputBuffer.collect(K key, V value, int partition)

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.20.1, 0.20.2, 0.20.203.0, 0.20.205.0, 0.23.0
    • Fix Version/s: None
    • Component/s: mrv2
    • Labels:
      None

      Description

      org.apache.hadoop.mapred.MapTask.MapOutputBuffer.collect accepts a pair of Key and Value. There is chance that if somehow if a Null value or Key is passed to the method, we can end up having NPE.

        Activity

        Subroto Sanyal created issue -
        Hide
        Harsh J added a comment -

        Doesn't this fall under the user-error land? Is this open to add-on guards that catch NPEs and throw up appropriate messages? I'm fine with that – just asking if that is the goal of this JIRA.

        Show
        Harsh J added a comment - Doesn't this fall under the user-error land? Is this open to add-on guards that catch NPEs and throw up appropriate messages? I'm fine with that – just asking if that is the goal of this JIRA.
        Hide
        Subroto Sanyal added a comment -

        Hi Harsh,
        I agree, passing null can be user's mistake/error but, I feel framework should be capable of handling it rather than just exposing bare NPE.

        Show
        Subroto Sanyal added a comment - Hi Harsh, I agree, passing null can be user's mistake/error but, I feel framework should be capable of handling it rather than just exposing bare NPE.
        Hide
        Bhallamudi Venkata Siva Kamesh added a comment -

        I think updating the API doc is enough, just like some of the Java lib APIs does.

        Throws:
          NullPointerException - If the key or value argument is null.
        
        Show
        Bhallamudi Venkata Siva Kamesh added a comment - I think updating the API doc is enough, just like some of the Java lib APIs does. Throws: NullPointerException - If the key or value argument is null.
        Hide
        Harsh J added a comment -

        I'm +1 for better error messages over javadocs. I think this is more of an user-error situation, than it is a behavior/warning.

        Would this regress any stuff with various serializations, that may even accept a 'null' as some valid k/v? Or does that have no chance to pass at all, since we attempt to call general methods on top of the key/value instances?

        Show
        Harsh J added a comment - I'm +1 for better error messages over javadocs. I think this is more of an user-error situation, than it is a behavior/warning. Would this regress any stuff with various serializations, that may even accept a 'null' as some valid k/v? Or does that have no chance to pass at all, since we attempt to call general methods on top of the key/value instances?
        Hide
        Subroto Sanyal added a comment -

        @Harsh
        If I am not wrong, the K/V will be Writables on which some methods will be exceuted like:

        • key.getClass()
        • public void serialize(Writable w) throws IOException {
                w.write(dataOut);
              }

        So I feel passing null to key/value will not be accepted by the framework. If user wants then they can use NullWritable for any such purpose.

        What do you say?

        Show
        Subroto Sanyal added a comment - @Harsh If I am not wrong, the K/V will be Writables on which some methods will be exceuted like: key.getClass() public void serialize(Writable w) throws IOException { w.write(dataOut); } So I feel passing null to key/value will not be accepted by the framework. If user wants then they can use NullWritable for any such purpose. What do you say?
        Hide
        Harsh J added a comment -

        Yes, I guessed so too. The .getClass stuff we do for type checking will be the death of 'null' at entry itself.

        Otherwise, the serializers are now entirely pluggable and are type agnostic – they are not limited to just Writables and if I wanted to write a generic serializer with <Object, Object>, I could as well do that (and nulls may be OK then, sometimes). This is what I was wondering about – but given the first point, I think it is not possible to pass a genuine null at all.

        I think it is alright to look for NPEs before the map output buffer runs into them (while doing those getClass stuff) and show a proper message ("Received k/v was null" or so?).

        Show
        Harsh J added a comment - Yes, I guessed so too. The .getClass stuff we do for type checking will be the death of 'null' at entry itself. Otherwise, the serializers are now entirely pluggable and are type agnostic – they are not limited to just Writables and if I wanted to write a generic serializer with <Object, Object>, I could as well do that (and nulls may be OK then, sometimes). This is what I was wondering about – but given the first point, I think it is not possible to pass a genuine null at all. I think it is alright to look for NPEs before the map output buffer runs into them (while doing those getClass stuff) and show a proper message ("Received k/v was null" or so?).
        Subroto Sanyal made changes -
        Field Original Value New Value
        Attachment MAPREDUCE-3516.patch [ 12511487 ]
        Subroto Sanyal made changes -
        Target Version/s 1.1.0 [ 12317960 ]
        Subroto Sanyal made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Subroto Sanyal added a comment -

        The patch is for Hadoop 1.1.0.
        The patch makes null checks for key and value and further it removes few unused imports.

        Show
        Subroto Sanyal added a comment - The patch is for Hadoop 1.1.0. The patch makes null checks for key and value and further it removes few unused imports.
        Hide
        Subroto Sanyal added a comment -

        Request review for the patch....

        Show
        Subroto Sanyal added a comment - Request review for the patch....
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12511487/MAPREDUCE-3516.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1651//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511487/MAPREDUCE-3516.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1651//console This message is automatically generated.
        Hide
        Harsh J added a comment -

        Change is fine (maybe add a test case to not regress? Should be easy to extend existing tests to cover this null check).

        Does this not affect trunk btw? Can we have a patch for that as well, if it does?

        Also, please do not include unrelated changes (Such as import statement rearrangements or cleanups). It is alright to file them as a separate JIRA instead.

        Show
        Harsh J added a comment - Change is fine (maybe add a test case to not regress? Should be easy to extend existing tests to cover this null check). Does this not affect trunk btw? Can we have a patch for that as well, if it does? Also, please do not include unrelated changes (Such as import statement rearrangements or cleanups). It is alright to file them as a separate JIRA instead.
        Hide
        Harsh J added a comment -

        All fixes and features must first go to trunk if needed. Backports may be discussed later, depending on need and criticality.

        Show
        Harsh J added a comment - All fixes and features must first go to trunk if needed. Backports may be discussed later, depending on need and criticality.
        Harsh J made changes -
        Target Version/s 1.1.0 [ 12317960 ] 0.24.0 [ 12317654 ]
        Harsh J made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]

          People

          • Assignee:
            Subroto Sanyal
            Reporter:
            Subroto Sanyal
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development