Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2862

Infinite loop in CombineFileInputFormat#getMoreSplits(), with missing blocks

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Hi, we met the infinite loop on CombineFileInputFormat#getMoreSplits().

      At first, we lost some blocks by mis-operation . Then, one job tried to use these missing blocks. At that time getMoreSplits() goes into the infinite loop.

      From our investigation, this List could be an empty array.
      > https://github.com/apache/hadoop-mapreduce/blob/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java#L363

      Then 'for' loop just after that line does nothing, and entry is not removed from 'blockToNodes'.

      Finally this line goes into the infinite loop.
      > https://github.com/apache/hadoop-mapreduce/blob/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java#L348

      We're now creating the patch against this problem...

        Activity

        Hide
        kzk Kazuki Ohta added a comment -

        Would be some options...

        • cause an error
        • add option to ignore these errors (log missing blocks in Warning level)
        Show
        kzk Kazuki Ohta added a comment - Would be some options... cause an error add option to ignore these errors (log missing blocks in Warning level)
        Hide
        frsyuki Sadayuki Furuhashi added a comment -

        Breaks infinite loop in CombineFileInputFormat.getMoreSplits by ignoring corrupted blocks.

        Show
        frsyuki Sadayuki Furuhashi added a comment - Breaks infinite loop in CombineFileInputFormat.getMoreSplits by ignoring corrupted blocks.
        Hide
        frsyuki Sadayuki Furuhashi added a comment -

        I attached a patch for the git HEAD. It shows error messages and ignores corrupted blocks.

        Show
        frsyuki Sadayuki Furuhashi added a comment - I attached a patch for the git HEAD. It shows error messages and ignores corrupted blocks.
        Hide
        tlipcon Todd Lipcon added a comment -

        Hey Sadayuki. Good to see you here on JIRA I think the patch you've attached is against the 0.20 branch. Can you please provide a patch against trunk as well? Thanks.

        Show
        tlipcon Todd Lipcon added a comment - Hey Sadayuki. Good to see you here on JIRA I think the patch you've attached is against the 0.20 branch. Can you please provide a patch against trunk as well? Thanks.
        Hide
        cnauroth Chris Nauroth added a comment -

        Sadayuki, thank you for submitting a patch on this. I've been bitten by this one too.

        This patch would log warnings about "corrupted files". Is it really true that this indicates corruption? My experience has been that I've seen this happen when CombineFileInputFormat tries to read newly written files that have not yet had their first block flushed. This isn't really corruption, so I'm wondering if logging warnings about corrupt files would give a user the wrong impression that the cluster is suffering from corruption.

        To workaround this, I've been running my jobs with a private patch of CombineFileInputFormat that adds this to the constructor for OneFileInfo:

        // Bail out if the block has no locations. This guards against an
        // infinite loop in getMoreSplits. This change is not present in open
        // source Hadoop.
        if (oneblock.length <= 0)
        continue;

        That prevents these blocks from ever entering the getMoreSplits logic in the first place. If you're interested in that approach instead, let me know, and I'll put the patch together. I'd still need to add a unit test for it too.

        Thanks again,
        --Chris

        Show
        cnauroth Chris Nauroth added a comment - Sadayuki, thank you for submitting a patch on this. I've been bitten by this one too. This patch would log warnings about "corrupted files". Is it really true that this indicates corruption? My experience has been that I've seen this happen when CombineFileInputFormat tries to read newly written files that have not yet had their first block flushed. This isn't really corruption, so I'm wondering if logging warnings about corrupt files would give a user the wrong impression that the cluster is suffering from corruption. To workaround this, I've been running my jobs with a private patch of CombineFileInputFormat that adds this to the constructor for OneFileInfo: // Bail out if the block has no locations. This guards against an // infinite loop in getMoreSplits. This change is not present in open // source Hadoop. if (oneblock.length <= 0) continue; That prevents these blocks from ever entering the getMoreSplits logic in the first place. If you're interested in that approach instead, let me know, and I'll put the patch together. I'd still need to add a unit test for it too. Thanks again, --Chris
        Hide
        subrotosanyal Subroto Sanyal added a comment -

        @Kazuki
        Are you hitting to same problem as MAPREDUCE-2185?

        Show
        subrotosanyal Subroto Sanyal added a comment - @Kazuki Are you hitting to same problem as MAPREDUCE-2185 ?

          People

          • Assignee:
            Unassigned
            Reporter:
            kzk Kazuki Ohta
          • Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development