Hadoop Common
  1. Hadoop Common
  2. HADOOP-2275

Erroneous detection of corrupted file when namenode fails to allocate any datanodes for newly allocated block

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.15.0
    • Fix Version/s: 0.16.0
    • Component/s: None
    • Labels:
      None

      Description

      It can so happen that the namenode allocated a block for a file and then fails to allocate any datanode for this block. The namenode delivers an exception to the client. The client retries. But the block remains associated with the file (until lease expiration). This causes all client retries to fail.

      An fsck (before the lease expires) reports this block as a missing block.

      1. badBlocks1.patch
        5 kB
        dhruba borthakur
      2. badBlocks1.patch
        5 kB
        dhruba borthakur

        Activity

        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-Nightly #318 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/318/ )
        Hide
        dhruba borthakur added a comment -

        I just committed this.

        Show
        dhruba borthakur added a comment - I just committed this.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12370226/badBlocks1.patch
        against trunk revision r599162.

        @author +1. The patch does not contain any @author tags.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests -1. The patch failed contrib unit tests.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1193/testReport/
        Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1193/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1193/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1193/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12370226/badBlocks1.patch against trunk revision r599162. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1193/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1193/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1193/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1193/console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12370226/badBlocks1.patch
        against trunk revision r598984.

        @author +1. The patch does not contain any @author tags.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests -1. The patch failed core unit tests.

        contrib tests -1. The patch failed contrib unit tests.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1189/testReport/
        Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1189/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1189/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1189/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12370226/badBlocks1.patch against trunk revision r598984. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1189/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1189/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1189/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1189/console This message is automatically generated.
        Hide
        Raghu Angadi added a comment -

        Thats right. +1 for the patch.

        Show
        Raghu Angadi added a comment - Thats right. +1 for the patch.
        Hide
        dhruba borthakur added a comment -

        If you call chooseTarget first, then you would need to acquire the FSnamesystem lock twice.... first time to get the replication factor of the file and then the second time to insert the block into the INode. This will happen for the normal code flow path.

        On the other hand, the current patch needs to acquire the fsnamesystem global lock twice only in the case of error. The normal code flow path remains unaffected.

        Show
        dhruba borthakur added a comment - If you call chooseTarget first, then you would need to acquire the FSnamesystem lock twice.... first time to get the replication factor of the file and then the second time to insert the block into the INode. This will happen for the normal code flow path. On the other hand, the current patch needs to acquire the fsnamesystem global lock twice only in the case of error. The normal code flow path remains unaffected.
        Hide
        Raghu Angadi added a comment -

        Would it be better to first chooseTarget() and then allocate a new block? In that case, you don't need to remove the block from the file.

        Show
        Raghu Angadi added a comment - Would it be better to first chooseTarget() and then allocate a new block? In that case, you don't need to remove the block from the file.
        Hide
        dhruba borthakur added a comment -

        If the namenode fails to allocate any datanode for a newly allocated block, it removes that block from the file before returning an exception to the client.

        Show
        dhruba borthakur added a comment - If the namenode fails to allocate any datanode for a newly allocated block, it removes that block from the file before returning an exception to the client.

          People

          • Assignee:
            dhruba borthakur
            Reporter:
            dhruba borthakur
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development