Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.17.0
    • Fix Version/s: 0.18.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      17.0 + H1979-H2159-H3442

    • Hadoop Flags:
      Reviewed

      Description

      I see bunch of datanodes stop verifying local blocks.

      ".out" showed

      -rw-r--r--  1 hdfs users 614 Jun 23 10:24 datanode.out
      
      Exception in thread "org.apache.hadoop.dfs.DataBlockScanner@aadc97" java.lang.NumberFormatException: For input string: "121195228080date="
              at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
              at java.lang.Long.parseLong(Long.java:412)
              at java.lang.Long.valueOf(Long.java:518)
              at org.apache.hadoop.dfs.DataBlockScanner$LogEntry.parseEntry(DataBlockScanner.java:351)
              at org.apache.hadoop.dfs.DataBlockScanner.assignInitialVerificationTimes(DataBlockScanner.java:481)
              at org.apache.hadoop.dfs.DataBlockScanner.run(DataBlockScanner.java:534)
              at java.lang.Thread.run(Thread.java:619)
      

      Namenode log also showed

      2008-06-23 10:24:12,831 WARN org.apache.hadoop.dfs.DataBlockScanner: RuntimeException during DataBlockScanner.run() : java.lang.NumberFormatException: For input string: "121195228080date="
              at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
              at java.lang.Long.parseLong(Long.java:412)
              at java.lang.Long.valueOf(Long.java:518)
              at org.apache.hadoop.dfs.DataBlockScanner$LogEntry.parseEntry(DataBlockScanner.java:351)
              at org.apache.hadoop.dfs.DataBlockScanner.assignInitialVerificationTimes(DataBlockScanner.java:481)
              at org.apache.hadoop.dfs.DataBlockScanner.run(DataBlockScanner.java:534)
              at java.lang.Thread.run(Thread.java:619)
      

      Datanode was still up and running but no verification.
      Jstack didn't show DataBlockScanner.

      1. 3635_20080627.patch
        6 kB
        Tsz Wo Nicholas Sze
      2. 3635_20080630.patch
        6 kB
        Tsz Wo Nicholas Sze
      3. jstack.H3635.txt
        10 kB
        Koji Noguchi

        Activity

        Hide
        Tsz Wo Nicholas Sze added a comment -

        DataBlockScanner object reads log files "dncp_block_verification.log.curr" and "dncp_block_verification.log.prev". These log files contain lines like

        date="2008-06-18 11:44:38,984" time="1213814678984" genstamp="1001" id="-969938011182314584"
        

        From the error message in the description, it seems to me that the log file was corrupted. ("121195228080date=": the number looks like a timestamp but it is somehow followed by another line.) Could you check your log files? They can be found in your datanode directory .../hadoop-SYSTEM/dfs/data/current

        Show
        Tsz Wo Nicholas Sze added a comment - DataBlockScanner object reads log files "dncp_block_verification.log.curr" and "dncp_block_verification.log.prev". These log files contain lines like date="2008-06-18 11:44:38,984" time="1213814678984" genstamp="1001" id="-969938011182314584" From the error message in the description, it seems to me that the log file was corrupted. ("121195228080date=": the number looks like a timestamp but it is somehow followed by another line.) Could you check your log files? They can be found in your datanode directory .../hadoop-SYSTEM/dfs/data/current
        Hide
        Koji Noguchi added a comment -
        $ grep 121195228080 dncp_block_verification.log.curr
        date="2008-05-28 05:24:40,808"   time="121195228080date="2008-05-28 06:01:58,818"        time="1211954518818"    id="4027181978358848026"
        

        Nicholas, you're right. It is corrupted.
        Out of 2000 datanodes, 54 datanodes are in this state after 3 days of uptime.

        Show
        Koji Noguchi added a comment - $ grep 121195228080 dncp_block_verification.log.curr date="2008-05-28 05:24:40,808" time="121195228080date="2008-05-28 06:01:58,818" time="1211954518818" id="4027181978358848026" Nicholas, you're right. It is corrupted. Out of 2000 datanodes, 54 datanodes are in this state after 3 days of uptime.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        > Out of 2000 datanodes, 54 datanodes are in this state after 3 days of uptime.

        Do you mean that there is no datanode restart during this 3-day uptime? If it is the case, we should mark this a 0.18 blocker.

        Show
        Tsz Wo Nicholas Sze added a comment - > Out of 2000 datanodes, 54 datanodes are in this state after 3 days of uptime. Do you mean that there is no datanode restart during this 3-day uptime? If it is the case, we should mark this a 0.18 blocker.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        The log files are used for updating BlockScanInfo, so that the same block checksum won't be verified to frequently. If some lines of a log file are corrupted, these lines should be ignored.

        Show
        Tsz Wo Nicholas Sze added a comment - The log files are used for updating BlockScanInfo, so that the same block checksum won't be verified to frequently. If some lines of a log file are corrupted, these lines should be ignored.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        There are race conditions in DataBlockScanner.LogFileHandler.out. Some accesses of it are not synchronized.

        Show
        Tsz Wo Nicholas Sze added a comment - There are race conditions in DataBlockScanner.LogFileHandler.out. Some accesses of it are not synchronized.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        3635_20080627.patch: synchronized all access of out and ignore lines in the log file with parsing errors.

        Show
        Tsz Wo Nicholas Sze added a comment - 3635_20080627.patch: synchronized all access of out and ignore lines in the log file with parsing errors.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        promote this to 0.18 blocker.

        Show
        Tsz Wo Nicholas Sze added a comment - promote this to 0.18 blocker.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Passed tests locally. Submitting.

        Show
        Tsz Wo Nicholas Sze added a comment - Passed tests locally. Submitting.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12384872/3635_20080627.patch
        against trunk revision 672376.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2763/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12384872/3635_20080627.patch against trunk revision 672376. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2763/console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12384872/3635_20080627.patch
        against trunk revision 672376.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12384872/3635_20080627.patch against trunk revision 672376. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/console This message is automatically generated.
        Hide
        Hairong Kuang added a comment -

        I think flush is not needed after writing each line in the log. Otherwise the patch looks good.

        Show
        Hairong Kuang added a comment - I think flush is not needed after writing each line in the log. Otherwise the patch looks good.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        3635_20080630.patch: removed flush().

        Show
        Tsz Wo Nicholas Sze added a comment - 3635_20080630.patch: removed flush().
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12384973/3635_20080630.patch
        against trunk revision 672848.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12384973/3635_20080630.patch against trunk revision 672848. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/console This message is automatically generated.
        Hide
        Hairong Kuang added a comment -

        I've just committed this. Thanks, Nicholas!

        Show
        Hairong Kuang added a comment - I've just committed this. Thanks, Nicholas!
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #535 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/535/ )

          People

          • Assignee:
            Tsz Wo Nicholas Sze
            Reporter:
            Koji Noguchi
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development