Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4974

Optimising the LineRecordReader initialize() method

    Details

      Description

      I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs & codec instantiated only if its a compressed input.
      Mean while Gelesh George Omathil, added if we could avoid the null check of key & value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key & value over initialize() method. We both have worked on it.

      1. MAPREDUCE-4974.2.patch
        12 kB
        Arun A K
      2. MAPREDUCE-4974.3.patch
        3 kB
        Gelesh
      3. MAPREDUCE-4974.4.patch
        2 kB
        Gelesh
      4. MAPREDUCE-4974.5.patch
        3 kB
        Gelesh

        Activity

        Arun A K created issue -
        Gelesh made changes -
        Field Original Value New Value
        Status Open [ 1 ] Patch Available [ 10002 ]
        Target Version/s 1.0.0, 1.0.4, 1.1.1, 2.0.0-alpha, 2.0.1-alpha, 0.23.4, 0.23.5 [ 12318240, 12323325, 12321660, 12320354, 12322466, 12323264, 12323312 ] 0.23.5, 0.23.4, 2.0.1-alpha, 2.0.0-alpha, 1.1.1, 1.0.4, 1.0.0 [ 12323312, 12323264, 12322466, 12320354, 12321660, 12323325, 12318240 ]
        Assignee Arun A K [ ak.arun@aol.com ] Gelesh [ gelesh ]
        Hide
        Gelesh added a comment -

        Combined thoughts of mine & Arun AK's,

        Show
        Gelesh added a comment - Combined thoughts of mine & Arun AK's,
        Gelesh made changes -
        Attachment MAPREDUCE-4974.1.patch [ 12567831 ]
        Hide
        Gelesh added a comment -

        Some body please review the patch,
        I couldnt even see the hadoop QA running on this.
        Kindly advice

        Show
        Gelesh added a comment - Some body please review the patch, I couldnt even see the hadoop QA running on this. Kindly advice
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12567831/MAPREDUCE-4974.1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3297//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3297//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567831/MAPREDUCE-4974.1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3297//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3297//console This message is automatically generated.
        Hide
        Gelesh added a comment -

        Its a improvement to the existing, no new features added or deleted,
        And hence, existing test case would suffice.

        Show
        Gelesh added a comment - Its a improvement to the existing, no new features added or deleted, And hence, existing test case would suffice.
        Arun A K made changes -
        Summary optimising the LineRecordReader initialize method Optimising the LineRecordReader initialize() method
        Hide
        Arun A K added a comment -

        Quoting the review request url for this issue - https://reviews.apache.org/r/9287/

        Show
        Arun A K added a comment - Quoting the review request url for this issue - https://reviews.apache.org/r/9287/
        Hide
        Todd Lipcon added a comment -

        Do you have any benchmark that shows this helps? Null checks can often be completely optimized out by the JIT, or at least hoisted out of the tight loop.

        Show
        Todd Lipcon added a comment - Do you have any benchmark that shows this helps? Null checks can often be completely optimized out by the JIT, or at least hoisted out of the tight loop.
        Hide
        Gelesh added a comment -

        Todd Lipcon
        nextKeyValue() is called as many number of times, the delimiter, or the new line has occurred, with in a given split.
        Each Time, it executes the below code,

        • if (key == null) { - key = new LongWritable(); - }
        • key.set(pos);
        • if (value == null) { - value = new Text(); - }

        Only at the first iteration, the condition would hold true, and Key Value objects would be created.
        This could also be done, if we have Key & Value objects created at the initialize phase, and we can skip this null check.

        Also,

        • compressionCodecs = new CompressionCodecFactory(job);
        • codec = compressionCodecs.getCodec(file);
          Need to be done , only when it uses a compressed input file. This change is also brought.
        Show
        Gelesh added a comment - Todd Lipcon nextKeyValue() is called as many number of times, the delimiter, or the new line has occurred, with in a given split. Each Time, it executes the below code, if (key == null) { - key = new LongWritable(); - } key.set(pos); if (value == null) { - value = new Text(); - } Only at the first iteration, the condition would hold true, and Key Value objects would be created. This could also be done, if we have Key & Value objects created at the initialize phase, and we can skip this null check. Also, compressionCodecs = new CompressionCodecFactory(job); codec = compressionCodecs.getCodec(file); Need to be done , only when it uses a compressed input file. This change is also brought.
        Hide
        Gelesh added a comment -

        Todd Lipcon
        I tried out an estimation,on Local, with small data, subtracting the the long value obtained from System.nanoTime() at the beginning and at the end of the method.

        Average time difference was 200 Nano Seconds per each anomic call made to nextKeyValue(), excluding the very first call, since it involves the object creation.

        The total time difference would be 200 * number of Key Value pairs generated per each Map Task.

        Show
        Gelesh added a comment - Todd Lipcon I tried out an estimation,on Local, with small data, subtracting the the long value obtained from System.nanoTime() at the beginning and at the end of the method. Average time difference was 200 Nano Seconds per each anomic call made to nextKeyValue(), excluding the very first call, since it involves the object creation. The total time difference would be 200 * number of Key Value pairs generated per each Map Task.
        Hide
        Arun A K added a comment -

        Kindly advice if the optimization is worth.

        Show
        Arun A K added a comment - Kindly advice if the optimization is worth.
        Hide
        Surenkumar Nihalani added a comment -

        +1

        • Initializing objects on null reference check looks very insecure when one reads the code. This change strengthens the code base.
        • LineRecordReader is used for every Map task. Null checking portion goes from O -> O(1). It's worth it in that sense.
        Show
        Surenkumar Nihalani added a comment - +1 Initializing objects on null reference check looks very insecure when one reads the code. This change strengthens the code base. LineRecordReader is used for every Map task. Null checking portion goes from O -> O(1) . It's worth it in that sense.
        Hide
        Surenkumar Nihalani added a comment -

        One advantage we also might not be able to quantify easily would be in garbage collection.

        I was looking through implementation of LineRecordReader.nextKeyValue, in the end of the parsing, we set key and value to null. In the next call, we will have them as null and if the while condition of getFilePosition <= end evaluates to true, then, we'll hit NPE because in.readLine(Text t..) does t.clear() first. Would there be any case in which the condition would evaluate to true even when we are done?
        I know this probably won't happen because user will wrap this in a while(nextKeyValue()), I just wanted to be sure that we won't hit NPEs after this change, even for buggy programs.

        Show
        Surenkumar Nihalani added a comment - One advantage we also might not be able to quantify easily would be in garbage collection. I was looking through implementation of LineRecordReader.nextKeyValue , in the end of the parsing, we set key and value to null. In the next call, we will have them as null and if the while condition of getFilePosition <= end evaluates to true , then, we'll hit NPE because in.readLine(Text t..) does t.clear() first. Would there be any case in which the condition would evaluate to true even when we are done? I know this probably won't happen because user will wrap this in a while(nextKeyValue()) , I just wanted to be sure that we won't hit NPEs after this change, even for buggy programs.
        Hide
        Gelesh added a comment -

        Surenkumar Nihalani,

        " .. while condition of getFilePosition <= end evaluates to true, then, we'll hit NPE .."
        The Text object value, which is pased to readLine, would not be null, since that is taken care at initialize method, which is called prior to nextKeyValue().

        While(nextKeyValue()) loop would end at once, the newSize (the size of newly fetched value equals zero.
        Here Key And Value , are set to null.
        But they aren't referred any more after While(nextKeyValue()) loop, and so NPE is not likely to occur.

        Please verify, and kindly correct me if we have gone wrong, some where.

        Show
        Gelesh added a comment - Surenkumar Nihalani , " .. while condition of getFilePosition <= end evaluates to true, then, we'll hit NPE .." The Text object value, which is pased to readLine, would not be null, since that is taken care at initialize method, which is called prior to nextKeyValue(). While(nextKeyValue()) loop would end at once, the newSize (the size of newly fetched value equals zero. Here Key And Value , are set to null. But they aren't referred any more after While(nextKeyValue()) loop, and so NPE is not likely to occur. Please verify, and kindly correct me if we have gone wrong, some where.
        Hide
        Surenkumar Nihalani added a comment -

        I am referring to a buggy program that calls nextKeyValue even after we return false. I just wanted to be sure that the check in the while loop will guard us from calling in.readLine(). in.readLine when passed a null Text type, we will hit NPE. That's the case I am trying to be safe about.

        Show
        Surenkumar Nihalani added a comment - I am referring to a buggy program that calls nextKeyValue even after we return false. I just wanted to be sure that the check in the while loop will guard us from calling in.readLine(). in.readLine when passed a null Text type, we will hit NPE. That's the case I am trying to be safe about.
        Hide
        Gelesh added a comment -

        As Surenkumar Nihalani has mentioned, a buggy programs that may call next KeyValue..
        condition though being a little hypothetical, but still possible.

        1) Inorder to avoid that, shall we have the null assignment of key & value in close() method.?
        2) Also shall, we have compressionCodecs also assigned as null ?

        Either me or Arun A K would upload a re work on the same.

        Show
        Gelesh added a comment - As Surenkumar Nihalani has mentioned, a buggy programs that may call next KeyValue.. condition though being a little hypothetical, but still possible. 1) Inorder to avoid that, shall we have the null assignment of key & value in close() method.? 2) Also shall, we have compressionCodecs also assigned as null ? Either me or Arun A K would upload a re work on the same.
        Hide
        Gelesh added a comment -

        Also, this change has instantiated objects related to compression, only if its a compressed file

        Inorder to ship the first line, a readLine is called, and this change would not create a new Text, but use the available 'value' for the method call.

        Hope some body could share their thoughts on this two changes as well.

        Show
        Gelesh added a comment - Also, this change has instantiated objects related to compression, only if its a compressed file Inorder to ship the first line, a readLine is called, and this change would not create a new Text, but use the available 'value' for the method call. Hope some body could share their thoughts on this two changes as well.
        Hide
        Arun A K added a comment -

        If someone could add their review comments, we could look on for the mentioned changes. https://reviews.apache.org/r/9287/

        Show
        Arun A K added a comment - If someone could add their review comments, we could look on for the mentioned changes. https://reviews.apache.org/r/9287/
        Hide
        Karthik Kambatla (Inactive) added a comment -

        Looks like calling #nextKeyValue() after it has already returned false will throw a NPE. IIUC, it should continue to return false.

        Show
        Karthik Kambatla (Inactive) added a comment - Looks like calling #nextKeyValue() after it has already returned false will throw a NPE. IIUC, it should continue to return false.
        Hide
        Karthik Kambatla (Inactive) added a comment -

        Removing fix versions - usually, the committer sets these at commit time.

        Show
        Karthik Kambatla (Inactive) added a comment - Removing fix versions - usually, the committer sets these at commit time.
        Karthik Kambatla (Inactive) made changes -
        Fix Version/s 0.20.204.0 [ 12316318 ]
        Fix Version/s 0.24.0 [ 12317654 ]
        Target Version/s 1.0.0, 1.0.4, 1.1.1, 2.0.0-alpha, 2.0.1-alpha, 0.23.4, 0.23.5 [ 12318240, 12323325, 12321660, 12320354, 12322466, 12323264, 12323312 ] 0.23.5, 0.23.4, 2.0.1-alpha, 2.0.0-alpha, 1.1.1, 1.0.4, 1.0.0 [ 12323312, 12323264, 12322466, 12320354, 12321660, 12323325, 12318240 ]
        Hide
        Arun A K added a comment -

        Key & Value null assignment is in nextKeyValue(), is moved to close() to avoid NPE, as per the review comments.

        Also, if (newSize == 0) check is voided inside the loop,
        since, if (newSize < maxLineLength)includes the same check.
        How ever, if(newSize == 0) condition is checked outside the while loop. Hope this would also improve performance.

        Combined effort with Gelesh.

        Show
        Arun A K added a comment - Key & Value null assignment is in nextKeyValue(), is moved to close() to avoid NPE, as per the review comments. Also, if (newSize == 0) check is voided inside the loop, since, if (newSize < maxLineLength)includes the same check. How ever, if(newSize == 0) condition is checked outside the while loop. Hope this would also improve performance. Combined effort with Gelesh.
        Arun A K made changes -
        Attachment MAPREDUCE-4974.2.patch [ 12568811 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12568811/MAPREDUCE-4974.2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3323//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3323//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12568811/MAPREDUCE-4974.2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3323//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3323//console This message is automatically generated.
        Hide
        Gelesh added a comment -

        Hadoop QA,
        as mentioned before, this is just an improvement.
        No new features added or removed.
        Existing test case holds for this as well.

        Show
        Gelesh added a comment - Hadoop QA , as mentioned before, this is just an improvement. No new features added or removed. Existing test case holds for this as well.
        Hide
        Karthik Kambatla (Inactive) added a comment -

        The latest approach seems correct at a first glance. However, the patch seems to be modifying the entire file, may be due to re-formatting. It would really help if it could only show the changed code.

        Show
        Karthik Kambatla (Inactive) added a comment - The latest approach seems correct at a first glance. However, the patch seems to be modifying the entire file, may be due to re-formatting. It would really help if it could only show the changed code.
        Hide
        Gelesh added a comment -

        Arun A K Seems like reformatting has shown entire LOC as new LOC, instead of changes alone.
        Karthik Kambatla, do we need to really re put the patch, so that the patch size would reduce. If not, could you please act or advice over the course of action.
        In case, if some changes over the code is required, please mention that too, our next patch would incorporate the same.

        Show
        Gelesh added a comment - Arun A K Seems like reformatting has shown entire LOC as new LOC, instead of changes alone. Karthik Kambatla , do we need to really re put the patch, so that the patch size would reduce. If not, could you please act or advice over the course of action. In case, if some changes over the code is required, please mention that too, our next patch would incorporate the same.
        Hide
        Gelesh added a comment -

        Arun A K's patch 4974.2 had shown all the lines as new lines, because of code reformatting. The same changes were captured, and a patch was build against previous commit. This time the size of patch is 3+KB. Please review.

        Show
        Gelesh added a comment - Arun A K 's patch 4974.2 had shown all the lines as new lines, because of code reformatting. The same changes were captured, and a patch was build against previous commit. This time the size of patch is 3+KB. Please review.
        Gelesh made changes -
        Attachment MAPREDUCE-4974.3.patch [ 12568943 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12568943/MAPREDUCE-4974.3.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3327//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3327//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12568943/MAPREDUCE-4974.3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3327//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3327//console This message is automatically generated.
        Hide
        Gelesh added a comment -

        The Existing test case is enough , because its just a code optimization,
        Could any body, have a look and comment please .. ?

        Show
        Gelesh added a comment - The Existing test case is enough , because its just a code optimization, Could any body, have a look and comment please .. ?
        Hide
        Arun A K added a comment -

        Please find the review request. https://reviews.apache.org/r/9440/

        Show
        Arun A K added a comment - Please find the review request. https://reviews.apache.org/r/9440/
        Hide
        Gelesh added a comment -

        Todd Lipcon, Surenkumar Nihalani, Karthik Kambatla
        Please share your thoughts

        Show
        Gelesh added a comment - Todd Lipcon , Surenkumar Nihalani , Karthik Kambatla Please share your thoughts
        Hide
        Surenkumar Nihalani added a comment -

        I don't think we return null if there is no key. So it no longer follows the contract at http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapreduce/RecordReader.html#getCurrentKey%28%29

        Can you confirm it works?

        Show
        Surenkumar Nihalani added a comment - I don't think we return null if there is no key. So it no longer follows the contract at http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapreduce/RecordReader.html#getCurrentKey%28%29 Can you confirm it works?
        Hide
        Gelesh added a comment -

        Surenkumar Nihalani,
        Thanks for bring up that very valid point.
        In That Case, What if we eliminate the null check for Value alone,
        And keep the Null Check for Key as such .. ?

        Show
        Gelesh added a comment - Surenkumar Nihalani , Thanks for bring up that very valid point. In That Case, What if we eliminate the null check for Value alone, And keep the Null Check for Key as such .. ?
        Hide
        Gelesh added a comment -

        And Also, as Arun A K has mentioned,
        1) To avoid ' if (newSize == 0) ' check inside the loop,
        2) if we have ' compressionCodecs & codec instantiated only if its a compressed input. '

        Hope These two points are valid,
        Please share your thoughts...

        Show
        Gelesh added a comment - And Also, as Arun A K has mentioned, 1) To avoid ' if (newSize == 0) ' check inside the loop, 2) if we have ' compressionCodecs & codec instantiated only if its a compressed input. ' Hope These two points are valid, Please share your thoughts...
        Hide
        Surenkumar Nihalani added a comment -

        I don't see the null check for key/value being the beneficial part of the optimization. Can you post the patch and I'll review it?

        I agree with the change

        2) if we have ' compressionCodecs & codec instantiated only if its a compressed input. '

        Show
        Surenkumar Nihalani added a comment - I don't see the null check for key/value being the beneficial part of the optimization. Can you post the patch and I'll review it? I agree with the change 2) if we have ' compressionCodecs & codec instantiated only if its a compressed input. '
        Hide
        Gelesh added a comment -

        Two Changes,

        1)
        if (newSize == 0)

        { break; }

        if (newSize < maxLineLength)

        { break; }

        The newSize==0 check is eliminated since,
        (newSize < maxLineLength) check includes that condition as well.
        The (newSize == 0) check outside the loop is retained as such.

        2)
        compressionCodecs = new coompressionCodecFactory(job);
        codec = compressionCodecs.getCodec(file);

        These lines of code are placed inside
        if (isCompressedInput()) { } Block
        So that , these objects would only be instantiated, if the input file is of a compressed format.

        Show
        Gelesh added a comment - Two Changes, 1) if (newSize == 0) { break; } if (newSize < maxLineLength) { break; } The newSize==0 check is eliminated since, (newSize < maxLineLength) check includes that condition as well. The (newSize == 0) check outside the loop is retained as such. 2) compressionCodecs = new coompressionCodecFactory(job); codec = compressionCodecs.getCodec(file); These lines of code are placed inside if (isCompressedInput()) { } Block So that , these objects would only be instantiated, if the input file is of a compressed format.
        Gelesh made changes -
        Attachment MAPREDUCE-4974.4.patch [ 12570453 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12570453/MAPREDUCE-4974.4.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3355//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3355//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570453/MAPREDUCE-4974.4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3355//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3355//console This message is automatically generated.
        Hide
        Gelesh added a comment -

        Surenkumar Nihalani,
        Please review,

        Show
        Gelesh added a comment - Surenkumar Nihalani , Please review,
        Hide
        Surenkumar Nihalani added a comment -

        Is this is the final patch you are proposing to submit? It seems like it's a diff of only last comment's changes.

        Show
        Surenkumar Nihalani added a comment - Is this is the final patch you are proposing to submit? It seems like it's a diff of only last comment's changes.
        Hide
        Gelesh added a comment -

        My self and Arun A K , are of.the opinion that we should also do something upon therepeated null check,
        Ans per the discussions over here , that part of optimization, seems to be non atracrtive. Hence the latest patch , we had eliminated null chech change. The remaining changes done, are mentioned iin comment. Please review

        Show
        Gelesh added a comment - My self and Arun A K , are of.the opinion that we should also do something upon therepeated null check, Ans per the discussions over here , that part of optimization, seems to be non atracrtive. Hence the latest patch , we had eliminated null chech change. The remaining changes done, are mentioned iin comment. Please review
        Hide
        Arun A K added a comment -

        As Gelesh has mentioned, we had in mind, elimination of repeated null checks, while trying to optimize the code. If it is of not much significance, please go ahead with the latest available patch containing the rest of changes.

        Show
        Arun A K added a comment - As Gelesh has mentioned, we had in mind, elimination of repeated null checks, while trying to optimize the code. If it is of not much significance, please go ahead with the latest available patch containing the rest of changes.
        Hide
        Surenkumar Nihalani added a comment -

        The one line of code that seems to be missing is key.set(pos). where is that being handled?

        Show
        Surenkumar Nihalani added a comment - The one line of code that seems to be missing is key.set(pos). where is that being handled?
        Hide
        Gelesh added a comment -

        Surenkumar Nihalani, I think you reffred an old patch,
        Please look at MAPREDUCE-4974.4.patch

        Show
        Gelesh added a comment - Surenkumar Nihalani , I think you reffred an old patch, Please look at MAPREDUCE-4974 .4.patch
        Gelesh made changes -
        Attachment MAPREDUCE-4974.1.patch [ 12567831 ]
        Hide
        Surenkumar Nihalani added a comment -

        Gelesh, Can you put this to review board give me a link to look at the final copy of the file like last time?

        Thanks.

        Show
        Surenkumar Nihalani added a comment - Gelesh , Can you put this to review board give me a link to look at the final copy of the file like last time? Thanks.
        Hide
        Gelesh added a comment -

        Arun A K has put this patch on review board.(Thanks AK)
        Surenkumar Nihalani, Please reffer this link, to visualize the patch diffrence
        https://reviews.apache.org/r/9440/diff/#index_header

        Show
        Gelesh added a comment - Arun A K has put this patch on review board.(Thanks AK) Surenkumar Nihalani , Please reffer this link, to visualize the patch diffrence https://reviews.apache.org/r/9440/diff/#index_header
        Hide
        Arun A K added a comment -

        Updated the review request url with the latest patch. Please find the same at - https://reviews.apache.org/r/9440/

        Show
        Arun A K added a comment - Updated the review request url with the latest patch. Please find the same at - https://reviews.apache.org/r/9440/
        Hide
        Surenkumar Nihalani added a comment -

        +1

        Show
        Surenkumar Nihalani added a comment - +1
        Hide
        Karthik Kambatla (Inactive) added a comment -

        +1

        Show
        Karthik Kambatla (Inactive) added a comment - +1
        Hide
        Arun A K added a comment -

        @ All, Can we mark this issue as resolved?

        Show
        Arun A K added a comment - @ All, Can we mark this issue as resolved?
        Hide
        Surenkumar Nihalani added a comment -

        It doesn't work that way. One of the committers will take a look at this, commit it to the codebase and then it will be marked as resolved.

        Show
        Surenkumar Nihalani added a comment - It doesn't work that way. One of the committers will take a look at this, commit it to the codebase and then it will be marked as resolved.
        Gelesh made changes -
        Rank Ranked higher
        Hide
        Gelesh added a comment -

        Robert Joseph Evans, Jason Lowe,
        Could any of you please act on this ?

        Show
        Gelesh added a comment - Robert Joseph Evans , Jason Lowe , Could any of you please act on this ?
        Hide
        Robert Joseph Evans added a comment -

        Sorry it has taken me so long to take a look.

        The patch looks fine to me too. All of the optimizations look fine and seem to not change anything with the existing behavior. +1. I'll check it in. If you want similar changes in the 1.0 line you will need to post a separate patch for that.

        Show
        Robert Joseph Evans added a comment - Sorry it has taken me so long to take a look. The patch looks fine to me too. All of the optimizations look fine and seem to not change anything with the existing behavior. +1. I'll check it in. If you want similar changes in the 1.0 line you will need to post a separate patch for that.
        Hide
        Robert Joseph Evans added a comment -

        Thanks Galesh,

        I put this into trunk, branch-2, and branch-0.23

        Show
        Robert Joseph Evans added a comment - Thanks Galesh, I put this into trunk, branch-2, and branch-0.23
        Robert Joseph Evans made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Target Version/s 1.0.0, 1.0.4, 1.1.1, 2.0.0-alpha, 2.0.1-alpha, 0.23.4, 0.23.5 [ 12318240, 12323325, 12321660, 12320354, 12322466, 12323264, 12323312 ] 0.23.5, 0.23.4, 2.0.1-alpha, 2.0.0-alpha, 1.1.1, 1.0.4, 1.0.0 [ 12323312, 12323264, 12322466, 12320354, 12321660, 12323325, 12318240 ]
        Fix Version/s 0.23.7 [ 12323954 ]
        Fix Version/s 2.0.5-beta [ 12324032 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk-Commit #3542 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3542/)
        MAPREDUCE-4974. Optimising the LineRecordReader initialize() method (Gelesh via bobby) (Revision 1463221)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463221
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
        Show
        Hudson added a comment - Integrated in Hadoop-trunk-Commit #3542 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3542/ ) MAPREDUCE-4974 . Optimising the LineRecordReader initialize() method (Gelesh via bobby) (Revision 1463221) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463221 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
        Hide
        Robert Joseph Evans added a comment -

        I am sorry about reopening this, but I did not take a look at it close enough before I put it in.

        The compression code cannot be moved. isCompressedInput() uses the value of codec internally. After this change compression is always off for every input format, because codec is never set and is always null. I am happy to leave the other half of the change in place. I will push the change to subversion shortly.

        Show
        Robert Joseph Evans added a comment - I am sorry about reopening this, but I did not take a look at it close enough before I put it in. The compression code cannot be moved. isCompressedInput() uses the value of codec internally. After this change compression is always off for every input format, because codec is never set and is always null. I am happy to leave the other half of the change in place. I will push the change to subversion shortly.
        Robert Joseph Evans made changes -
        Resolution Fixed [ 1 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Hide
        Gera Shegalov added a comment -

        I already wondered why my comment on reviewboard about this is being ignored.

        Show
        Gera Shegalov added a comment - I already wondered why my comment on reviewboard about this is being ignored.
        Hide
        Robert Joseph Evans added a comment - - edited

        I pulled the patch out. Sorry Gera I did not see the review board comments. I will try to be more complete next time. All of the other times that I have used review board the comments were posted to the JIRA, but I guess that is not the case any more. Either way it is my fault.

        The test that caught this too is TestMRKeyValueTextInputFormat.

        Show
        Robert Joseph Evans added a comment - - edited I pulled the patch out. Sorry Gera I did not see the review board comments. I will try to be more complete next time. All of the other times that I have used review board the comments were posted to the JIRA, but I guess that is not the case any more. Either way it is my fault. The test that caught this too is TestMRKeyValueTextInputFormat.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk-Commit #3546 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3546/)
        Reverted MAPREDUCE-4974 because of test failures. (Revision 1463359)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463359
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
        Show
        Hudson added a comment - Integrated in Hadoop-trunk-Commit #3546 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3546/ ) Reverted MAPREDUCE-4974 because of test failures. (Revision 1463359) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463359 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Yarn-trunk #173 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/173/)
        Reverted MAPREDUCE-4974 because of test failures. (Revision 1463359)
        MAPREDUCE-4974. Optimising the LineRecordReader initialize() method (Gelesh via bobby) (Revision 1463221)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463359
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java

        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463221
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
        Show
        Hudson added a comment - Integrated in Hadoop-Yarn-trunk #173 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/173/ ) Reverted MAPREDUCE-4974 because of test failures. (Revision 1463359) MAPREDUCE-4974 . Optimising the LineRecordReader initialize() method (Gelesh via bobby) (Revision 1463221) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463359 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463221 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Build #571 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/571/)
        Reverted MAPREDUCE-4974 because of test failures. (Revision 1463361)
        svn merge -c 1463221 FIXES: MAPREDUCE-4974. Optimising the LineRecordReader initialize() method (Gelesh via bobby) (Revision 1463224)

        Result = UNSTABLE
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463361
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java

        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463224
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #571 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/571/ ) Reverted MAPREDUCE-4974 because of test failures. (Revision 1463361) svn merge -c 1463221 FIXES: MAPREDUCE-4974 . Optimising the LineRecordReader initialize() method (Gelesh via bobby) (Revision 1463224) Result = UNSTABLE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463361 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463224 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1362 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1362/)
        Reverted MAPREDUCE-4974 because of test failures. (Revision 1463359)
        MAPREDUCE-4974. Optimising the LineRecordReader initialize() method (Gelesh via bobby) (Revision 1463221)

        Result = FAILURE
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463359
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java

        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463221
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1362 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1362/ ) Reverted MAPREDUCE-4974 because of test failures. (Revision 1463359) MAPREDUCE-4974 . Optimising the LineRecordReader initialize() method (Gelesh via bobby) (Revision 1463221) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463359 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463221 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
        Hide
        Gelesh added a comment -

        Gera Shegalov, I too apologise for not noticing you comments over review board. I had not much idea about review board, and was expecting the review comments over here(Jira).

        Thanks for sharing your thoughts.

        Show
        Gelesh added a comment - Gera Shegalov , I too apologise for not noticing you comments over review board. I had not much idea about review board, and was expecting the review comments over here(Jira). Thanks for sharing your thoughts.
        Hide
        Gelesh added a comment -

        Gera Shegalov, Robert Joseph Evans,
        I would suggest to have isCompressedInput a private boolean variable by default false, instead of isCompressedInput() method.
        This would help us to reduce the scope of Codec object along with CompressionCodecFactory object, to local. Which as of now is a class variable ?

        I would be patching this modification shortly.

        Show
        Gelesh added a comment - Gera Shegalov , Robert Joseph Evans , I would suggest to have isCompressedInput a private boolean variable by default false, instead of isCompressedInput() method. This would help us to reduce the scope of Codec object along with CompressionCodecFactory object, to local. Which as of now is a class variable ? I would be patching this modification shortly.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1389 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/)
        Reverted MAPREDUCE-4974 because of test failures. (Revision 1463359)
        MAPREDUCE-4974. Optimising the LineRecordReader initialize() method (Gelesh via bobby) (Revision 1463221)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463359
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java

        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463221
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1389 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/ ) Reverted MAPREDUCE-4974 because of test failures. (Revision 1463359) MAPREDUCE-4974 . Optimising the LineRecordReader initialize() method (Gelesh via bobby) (Revision 1463221) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463359 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463221 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
        Hide
        Gelesh added a comment -

        Reduced the scope of compressionCodecs & codec

        Introduced boolean isCompressedInput instead of boolean isCompressedInput()

        Show
        Gelesh added a comment - Reduced the scope of compressionCodecs & codec Introduced boolean isCompressedInput instead of boolean isCompressedInput()
        Gelesh made changes -
        Status Reopened [ 4 ] Patch Available [ 10002 ]
        Target Version/s 1.0.0, 1.0.4, 1.1.1, 2.0.0-alpha, 2.0.1-alpha, 0.23.4, 0.23.5 [ 12318240, 12323325, 12321660, 12320354, 12322466, 12323264, 12323312 ] 0.23.5, 0.23.4, 2.0.1-alpha, 2.0.0-alpha, 1.1.1, 1.0.4, 1.0.0 [ 12323312, 12323264, 12322466, 12320354, 12321660, 12323325, 12318240 ]
        Hide
        Gelesh added a comment -

        CompressionCodecFactory compressionCodecs, and CompressionCodec codec, object made local to initialise(), private boolean isCompressedInput introduced instead of private boolean isCompressedInput()

        Show
        Gelesh added a comment - CompressionCodecFactory compressionCodecs, and CompressionCodec codec, object made local to initialise(), private boolean isCompressedInput introduced instead of private boolean isCompressedInput()
        Gelesh made changes -
        Attachment MAPREDUCE-4974.5.patch [ 12576759 ]
        Hide
        Gelesh added a comment -

        Hadoop QA Automated check is yet to act over patch .5. Kindly advice.
        Please refer the review board https://reviews.apache.org/r/9440/diff/3/

        Show
        Gelesh added a comment - Hadoop QA Automated check is yet to act over patch .5. Kindly advice. Please refer the review board https://reviews.apache.org/r/9440/diff/3/
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12576759/MAPREDUCE-4974.5.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3494//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3494//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576759/MAPREDUCE-4974.5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3494//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3494//console This message is automatically generated.
        Hide
        Gelesh added a comment -

        Since this is just an optimization and existing test case would suffice, hope this is +1
        Could some body kindly review.

        Show
        Gelesh added a comment - Since this is just an optimization and existing test case would suffice, hope this is +1 Could some body kindly review.
        Hide
        Gelesh added a comment -

        Jason Lowe, Robert Joseph Evans, Gera Shegalov , Surenkumar Nihalani, Karthik Kambatla
        Could any body please share views / suggestions ...

        Show
        Gelesh added a comment - Jason Lowe , Robert Joseph Evans , Gera Shegalov , Surenkumar Nihalani , Karthik Kambatla Could any body please share views / suggestions ...
        Hide
        Gera Shegalov added a comment -

        Gelesh, I am not a committer. It looks fine to me. While waiting for committers' review, if you have not done so yet you probably should rerun locally the inputformat-related test suites to make sure nothing is broken. And you should maybe also try to quantify the performance improvement using some local MR job.

        Show
        Gera Shegalov added a comment - Gelesh , I am not a committer. It looks fine to me. While waiting for committers' review, if you have not done so yet you probably should rerun locally the inputformat-related test suites to make sure nothing is broken. And you should maybe also try to quantify the performance improvement using some local MR job.
        Hide
        Gelesh added a comment -

        Gera Shegalov .. thanks for sharing your thoughts,

        I have tested using JUnit run of TestLineRecordReader , but as of now, for compressed input test case is not incorporated in TestLineRecordReader. Thats a place we need to cross check, but hope the code would hold good, because modification in this area is minimal.

        The aim was to perfomance enhance, by removing the null check .. but the incompatibility with any build happen upon the existing may give NPE , as discussed above (Surenkumar Nihalani's comments,

        The patch was limited to
        1) removing the null assignments for the key & Value
        2) limiting CompressionCodecFactory , and Codec to method local scope
        3) removing line 170-173

        if (newSize == 0)

        { break; }

        Unnecessary ==0 check inside a look. ... Because the code to handle this is there iut side the loop, and the code which does the same seems of no value add.

        4) in order to achieve point 2 , private boolen isCompressedInput variable was introduces instead if
        private boolean isCompressedInput();
        method.

        Show
        Gelesh added a comment - Gera Shegalov .. thanks for sharing your thoughts, I have tested using JUnit run of TestLineRecordReader , but as of now, for compressed input test case is not incorporated in TestLineRecordReader. Thats a place we need to cross check, but hope the code would hold good, because modification in this area is minimal. The aim was to perfomance enhance, by removing the null check .. but the incompatibility with any build happen upon the existing may give NPE , as discussed above ( Surenkumar Nihalani 's comments, The patch was limited to 1) removing the null assignments for the key & Value 2) limiting CompressionCodecFactory , and Codec to method local scope 3) removing line 170-173 if (newSize == 0) { break; } Unnecessary ==0 check inside a look. ... Because the code to handle this is there iut side the loop, and the code which does the same seems of no value add. 4) in order to achieve point 2 , private boolen isCompressedInput variable was introduces instead if private boolean isCompressedInput(); method.
        Hide
        Robert Joseph Evans added a comment -

        The code changes look fine to me too, but I want to run a few more tests than the precommit build does. I just don't want to have to pull it out yet again.

        Show
        Robert Joseph Evans added a comment - The code changes look fine to me too, but I want to run a few more tests than the precommit build does. I just don't want to have to pull it out yet again.
        Hide
        Robert Joseph Evans added a comment -

        The changes look fine and the tests passed, so I am going to check this into trunk and branch-2.

        Show
        Robert Joseph Evans added a comment - The changes look fine and the tests passed, so I am going to check this into trunk and branch-2.
        Hide
        Robert Joseph Evans added a comment -

        Thanks Gelesh,

        I put this in trunk and branch-2.

        Show
        Robert Joseph Evans added a comment - Thanks Gelesh, I put this in trunk and branch-2.
        Robert Joseph Evans made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Target Version/s 1.0.0, 1.0.4, 1.1.1, 2.0.0-alpha, 2.0.1-alpha, 0.23.4, 0.23.5 [ 12318240, 12323325, 12321660, 12320354, 12322466, 12323264, 12323312 ] 0.23.5, 0.23.4, 2.0.1-alpha, 2.0.0-alpha, 1.1.1, 1.0.4, 1.0.0 [ 12323312, 12323264, 12322466, 12320354, 12321660, 12323325, 12318240 ]
        Fix Version/s trunk [ 12320360 ]
        Fix Version/s 0.23.7 [ 12323954 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk-Commit #3614 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3614/)
        MAPREDUCE-4974. Optimising the LineRecordReader initialize() method (Gelesh via bobby) (Revision 1468232)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1468232
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
        Show
        Hudson added a comment - Integrated in Hadoop-trunk-Commit #3614 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3614/ ) MAPREDUCE-4974 . Optimising the LineRecordReader initialize() method (Gelesh via bobby) (Revision 1468232) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1468232 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Yarn-trunk #185 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/185/)
        MAPREDUCE-4974. Optimising the LineRecordReader initialize() method (Gelesh via bobby) (Revision 1468232)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1468232
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
        Show
        Hudson added a comment - Integrated in Hadoop-Yarn-trunk #185 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/185/ ) MAPREDUCE-4974 . Optimising the LineRecordReader initialize() method (Gelesh via bobby) (Revision 1468232) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1468232 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
        Hide
        Sonu Prathap added a comment -

        Could somebody kindly do a check, share , update with the Test Case with compressed input in TestLineRecordReader, and recheck MAPREDUCE 4974.

        please refer, MAPREDUCE-5143

        Show
        Sonu Prathap added a comment - Could somebody kindly do a check, share , update with the Test Case with compressed input in TestLineRecordReader, and recheck MAPREDUCE 4974. please refer, MAPREDUCE-5143
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1374 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1374/)
        MAPREDUCE-4974. Optimising the LineRecordReader initialize() method (Gelesh via bobby) (Revision 1468232)

        Result = FAILURE
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1468232
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1374 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1374/ ) MAPREDUCE-4974 . Optimising the LineRecordReader initialize() method (Gelesh via bobby) (Revision 1468232) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1468232 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1401 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1401/)
        MAPREDUCE-4974. Optimising the LineRecordReader initialize() method (Gelesh via bobby) (Revision 1468232)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1468232
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1401 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1401/ ) MAPREDUCE-4974 . Optimising the LineRecordReader initialize() method (Gelesh via bobby) (Revision 1468232) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1468232 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
        Sachin Jose made changes -
        Affects Version/s 0.23.5 [ 12323312 ]
        Arun C Murthy made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Allen Wittenauer made changes -
        Fix Version/s trunk [ 12320360 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        5m 12s 1 Gelesh 04/Feb/13 12:31
        Resolved Resolved Reopened Reopened
        8h 13m 1 Robert Joseph Evans 02/Apr/13 03:03
        Reopened Reopened Patch Available Patch Available
        1d 8h 24m 1 Gelesh 03/Apr/13 11:28
        Patch Available Patch Available Resolved Resolved
        68d 16h 27m 2 Robert Joseph Evans 15/Apr/13 22:37
        Resolved Resolved Closed Closed
        134d 45m 1 Arun C Murthy 27/Aug/13 23:22

          People

          • Assignee:
            Gelesh
            Reporter:
            Arun A K
          • Votes:
            1 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

            • Due:
              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 1h
              1h
              Remaining:
              Remaining Estimate - 1h
              1h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development