Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1820

InputSampler does not create a deep copy of the key object when creating a sample, which causes problems with some formats like SequenceFile<Text,Text>

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.22.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I tried to use the InputSampler on a SequenceFile<Text,Text> and found that it comes up with duplicate keys in the sample. The problem was tracked down to the fact that the Text object returned from the reader is essentially a wrapper pointing to a byte array, which changes as the sequence file reader progresses. There was also a bug in that the reader should be initialized before the use. The am attaching a patch that fixes both of the issues. --Alex K

      1. MAPREDUCE-1820.patch
        6 kB
        Alex Kozlov
      2. MAPREDUCE-1820-2.patch
        6 kB
        Alex Kozlov
      3. MAPREDUCE-1820-3.patch
        6 kB
        Alex Kozlov
      4. M1820-4.patch
        11 kB
        Chris Douglas
      5. M1820-5.patch
        11 kB
        Chris Douglas

        Issue Links

          Activity

          Alex Kozlov created issue -
          Alex Kozlov made changes -
          Field Original Value New Value
          Status Open [ 1 ] Patch Available [ 10002 ]
          Alex Kozlov made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Assignee Alex Kozlov [ alexvk ]
          Hide
          Alex Kozlov added a comment -

          A simple patch that clones the objects inserted into sampes collection

          Show
          Alex Kozlov added a comment - A simple patch that clones the objects inserted into sampes collection
          Alex Kozlov made changes -
          Attachment MAPREDUCE-1820.patch [ 12445594 ]
          Alex Kozlov made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12445594/MAPREDUCE-1820.patch
          against trunk revision 947758.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause tar ant target to fail.

          -1 findbugs. The patch appears to cause Findbugs to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/206/testReport/
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/206/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/206/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12445594/MAPREDUCE-1820.patch against trunk revision 947758. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/206/testReport/ Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/206/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/206/console This message is automatically generated.
          Hide
          Alex Kozlov added a comment -

          Fix syntactic issues...

          Show
          Alex Kozlov added a comment - Fix syntactic issues...
          Alex Kozlov made changes -
          Attachment MAPREDUCE-1820-2.patch [ 12445604 ]
          Alex Kozlov made changes -
          Summary InputSampler does not create a deep copy of the key class when creating a sample, which causes problems with some formats like SequenceFile<Text,Text> InputSampler does not create a deep copy of the key object when creating a sample, which causes problems with some formats like SequenceFile<Text,Text>
          Hide
          Alex Kozlov added a comment -

          A small fix to IntervalSampler...

          Show
          Alex Kozlov added a comment - A small fix to IntervalSampler...
          Alex Kozlov made changes -
          Attachment MAPREDUCE-1820-3.patch [ 12445629 ]
          Hide
          Chris Douglas added a comment -

          I missed this in MAPREDUCE-366 when the samplers were converted to the new API.

          • The changes to writePartitionFile look like whitespace and debugging info; could they be removed?
          • A unit test would have prevented this regression. Would you mind writing one for the samplers?
          Show
          Chris Douglas added a comment - I missed this in MAPREDUCE-366 when the samplers were converted to the new API. The changes to writePartitionFile look like whitespace and debugging info; could they be removed? A unit test would have prevented this regression. Would you mind writing one for the samplers?
          Chris Douglas made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Jeff Hammerbacher made changes -
          Link This issue is related to MAPREDUCE-366 [ MAPREDUCE-366 ]
          Hide
          Chris Douglas added a comment -

          Added a unit test. Ideally, this should be in 0.21.

          Show
          Chris Douglas added a comment - Added a unit test. Ideally, this should be in 0.21.
          Chris Douglas made changes -
          Attachment M1820-4.patch [ 12448832 ]
          Chris Douglas made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12448832/M1820-4.patch
          against trunk revision 960808.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 2 new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/593/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/593/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/593/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/593/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448832/M1820-4.patch against trunk revision 960808. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/593/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/593/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/593/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/593/console This message is automatically generated.
          Hide
          Chris Douglas added a comment -

          Fixed findbugs warnings.

          Show
          Chris Douglas added a comment - Fixed findbugs warnings.
          Chris Douglas made changes -
          Attachment M1820-5.patch [ 12448863 ]
          Chris Douglas made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Chris Douglas made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12448863/M1820-5.patch
          against trunk revision 960808.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/290/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/290/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/290/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/290/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448863/M1820-5.patch against trunk revision 960808. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/290/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/290/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/290/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/290/console This message is automatically generated.
          Hide
          Chris Douglas added a comment -

          +1

          I committed this. Thanks, Alex!

          Show
          Chris Douglas added a comment - +1 I committed this. Thanks, Alex!
          Chris Douglas made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Fix Version/s 0.22.0 [ 12314184 ]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/ )
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #1701 (See https://hudson.apache.org/hudson/job/HBase-TRUNK/1701/)
          HBASE-3392. Update backport of InputSampler to reflect MAPREDUCE-1820

          Show
          Hudson added a comment - Integrated in HBase-TRUNK #1701 (See https://hudson.apache.org/hudson/job/HBase-TRUNK/1701/ ) HBASE-3392 . Update backport of InputSampler to reflect MAPREDUCE-1820
          Konstantin Shvachko made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Zhijie Shen made changes -
          Link This issue is related to MAPREDUCE-5225 [ MAPREDUCE-5225 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Patch Available Patch Available Open Open
          10d 12h 2m 3 Chris Douglas 07/Jul/10 08:29
          Open Open Patch Available Patch Available
          30d 20h 57m 4 Chris Douglas 07/Jul/10 08:30
          Patch Available Patch Available Resolved Resolved
          16h 41m 1 Chris Douglas 08/Jul/10 01:11
          Resolved Resolved Closed Closed
          522d 6h 8m 1 Konstantin Shvachko 12/Dec/11 06:19

            People

            • Assignee:
              Alex Kozlov
              Reporter:
              Alex Kozlov
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 2h
                2h
                Remaining:
                Remaining Estimate - 2h
                2h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development