Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2021

CombineFileInputFormat returns duplicate hostnames in split locations

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.2
    • Fix Version/s: 0.22.0
    • Component/s: client
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      CombineFileInputFormat.getSplits creates splits with duplicate locations. It adds locations of the files in the split to an ArrayList; if all the files are on same location, the location is added again and again. Instead, it should add it to a Set instead of List to avoid duplicates.

      1. patch-2021.txt
        5 kB
        Amareshwari Sriramadasu
      2. patch-2021-ydist.txt
        5 kB
        Amareshwari Sriramadasu
      3. patch-2021-1.txt
        6 kB
        Amareshwari Sriramadasu
      4. patch-2021-ydist.txt
        6 kB
        Amareshwari Sriramadasu

        Issue Links

          Activity

          Amareshwari Sriramadasu created issue -
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch with the fix and regression test.

          Show
          Amareshwari Sriramadasu added a comment - Patch with the fix and regression test.
          Amareshwari Sriramadasu made changes -
          Field Original Value New Value
          Attachment patch-2021.txt [ 12452605 ]
          Hide
          Amareshwari Sriramadasu added a comment -

          test-patch result:

               [exec] +1 overall.
               [exec]
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec]
               [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
               [exec]
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec]
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec]
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec]
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
               [exec]
          
          Show
          Amareshwari Sriramadasu added a comment - test-patch result: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec]
          Amareshwari Sriramadasu made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Greg Roelofs added a comment -

          Note the similarity to MAPREDUCE-1974. (Not the same issue, though.)

          Show
          Greg Roelofs added a comment - Note the similarity to MAPREDUCE-1974 . (Not the same issue, though.)
          Hide
          Amareshwari Sriramadasu added a comment -

          All the core and contrib tests passed with the patch.

          Show
          Amareshwari Sriramadasu added a comment - All the core and contrib tests passed with the patch.
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch for Yahoo! distribution. The patch is on top of HADOOP-5759.

          Show
          Amareshwari Sriramadasu added a comment - Patch for Yahoo! distribution. The patch is on top of HADOOP-5759 .
          Amareshwari Sriramadasu made changes -
          Attachment patch-2021-ydist.txt [ 12452787 ]
          Hide
          Scott Chen added a comment -

          Looks good. We will need this change too.
          One nitpick: Can you also change the comments in the beginning of testSplitPlacement() which describe how files are created?

          Show
          Scott Chen added a comment - Looks good. We will need this change too. One nitpick: Can you also change the comments in the beginning of testSplitPlacement() which describe how files are created?
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch changes the comments for testSplitPlacement().
          test-patch and ant test ran successfully. All core and contrib tests passed except TestTaskLauncher and TestTaskTrackerLocalization (due to MAPREDUCE-2031) and TestJobOutputCommitter(MAPREDUCE-2032).

          Show
          Amareshwari Sriramadasu added a comment - Patch changes the comments for testSplitPlacement(). test-patch and ant test ran successfully. All core and contrib tests passed except TestTaskLauncher and TestTaskTrackerLocalization (due to MAPREDUCE-2031 ) and TestJobOutputCommitter( MAPREDUCE-2032 ).
          Amareshwari Sriramadasu made changes -
          Attachment patch-2021-1.txt [ 12453025 ]
          Hide
          Amareshwari Sriramadasu added a comment -

          Scott, Can you have a look at the latest patch? Would like to check this in if there are no further comments. Thanks.

          Show
          Amareshwari Sriramadasu added a comment - Scott, Can you have a look at the latest patch? Would like to check this in if there are no further comments. Thanks.
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch has been tested on our cluster. Assuming there are no more comments from Scott, I just committed this.

          Show
          Amareshwari Sriramadasu added a comment - Patch has been tested on our cluster. Assuming there are no more comments from Scott, I just committed this.
          Amareshwari Sriramadasu made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Hide
          Scott Chen added a comment -

          Sorry for the late reply. I just got back from China for the Hadoop China conference.
          The patch looks good. Thanks for the fix!

          Show
          Scott Chen added a comment - Sorry for the late reply. I just got back from China for the Hadoop China conference. The patch looks good. Thanks for the fix!
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch for Yahoo! distribution with comment for test-case fixed.

          Show
          Amareshwari Sriramadasu added a comment - Patch for Yahoo! distribution with comment for test-case fixed.
          Amareshwari Sriramadasu made changes -
          Attachment patch-2021-ydist.txt [ 12453983 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/ )
          Konstantin Shvachko made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Harsh J made changes -
          Link This issue is duplicated by MAPREDUCE-4593 [ MAPREDUCE-4593 ]
          Harsh J made changes -
          Link This issue relates HIVE-3387 [ HIVE-3387 ]
          Gavin made changes -
          Link This issue relates to HIVE-3387 [ HIVE-3387 ]
          Gavin made changes -
          Link This issue relates to HIVE-3387 [ HIVE-3387 ]

            People

            • Assignee:
              Amareshwari Sriramadasu
              Reporter:
              Amareshwari Sriramadasu
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development