Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.17.0
    • Fix Version/s: 0.18.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      17.0 + H1979-H2159-H3442

    • Hadoop Flags:
      Reviewed

      Description

      I see bunch of datanodes stop verifying local blocks.

      ".out" showed

      -rw-r--r--  1 hdfs users 614 Jun 23 10:24 datanode.out
      
      Exception in thread "org.apache.hadoop.dfs.DataBlockScanner@aadc97" java.lang.NumberFormatException: For input string: "121195228080date="
              at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
              at java.lang.Long.parseLong(Long.java:412)
              at java.lang.Long.valueOf(Long.java:518)
              at org.apache.hadoop.dfs.DataBlockScanner$LogEntry.parseEntry(DataBlockScanner.java:351)
              at org.apache.hadoop.dfs.DataBlockScanner.assignInitialVerificationTimes(DataBlockScanner.java:481)
              at org.apache.hadoop.dfs.DataBlockScanner.run(DataBlockScanner.java:534)
              at java.lang.Thread.run(Thread.java:619)
      

      Namenode log also showed

      2008-06-23 10:24:12,831 WARN org.apache.hadoop.dfs.DataBlockScanner: RuntimeException during DataBlockScanner.run() : java.lang.NumberFormatException: For input string: "121195228080date="
              at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
              at java.lang.Long.parseLong(Long.java:412)
              at java.lang.Long.valueOf(Long.java:518)
              at org.apache.hadoop.dfs.DataBlockScanner$LogEntry.parseEntry(DataBlockScanner.java:351)
              at org.apache.hadoop.dfs.DataBlockScanner.assignInitialVerificationTimes(DataBlockScanner.java:481)
              at org.apache.hadoop.dfs.DataBlockScanner.run(DataBlockScanner.java:534)
              at java.lang.Thread.run(Thread.java:619)
      

      Datanode was still up and running but no verification.
      Jstack didn't show DataBlockScanner.

      1. 3635_20080627.patch
        6 kB
        Tsz Wo Nicholas Sze
      2. 3635_20080630.patch
        6 kB
        Tsz Wo Nicholas Sze
      3. jstack.H3635.txt
        10 kB
        Koji Noguchi

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Patch Available Patch Available Open Open
        2d 1h 1 Tsz Wo Nicholas Sze 30/Jun/08 19:25
        Open Open Patch Available Patch Available
        3d 17h 22m 2 Tsz Wo Nicholas Sze 30/Jun/08 19:25
        Patch Available Patch Available Resolved Resolved
        2h 35m 1 Hairong Kuang 30/Jun/08 22:01
        Resolved Resolved Closed Closed
        52d 22h 48m 1 Nigel Daley 22/Aug/08 20:50
        Owen O'Malley made changes -
        Component/s dfs [ 12310710 ]
        Nigel Daley made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #535 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/535/ )
        Hairong Kuang made changes -
        Hadoop Flags [Reviewed]
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Hairong Kuang added a comment -

        I've just committed this. Thanks, Nicholas!

        Show
        Hairong Kuang added a comment - I've just committed this. Thanks, Nicholas!
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12384973/3635_20080630.patch
        against trunk revision 672848.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12384973/3635_20080630.patch against trunk revision 672848. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/console This message is automatically generated.
        Tsz Wo Nicholas Sze made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Tsz Wo Nicholas Sze made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Tsz Wo Nicholas Sze made changes -
        Attachment 3635_20080630.patch [ 12384973 ]
        Hide
        Tsz Wo Nicholas Sze added a comment -

        3635_20080630.patch: removed flush().

        Show
        Tsz Wo Nicholas Sze added a comment - 3635_20080630.patch: removed flush().
        Hide
        Hairong Kuang added a comment -

        I think flush is not needed after writing each line in the log. Otherwise the patch looks good.

        Show
        Hairong Kuang added a comment - I think flush is not needed after writing each line in the log. Otherwise the patch looks good.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12384872/3635_20080627.patch
        against trunk revision 672376.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12384872/3635_20080627.patch against trunk revision 672376. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12384872/3635_20080627.patch
        against trunk revision 672376.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2763/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12384872/3635_20080627.patch against trunk revision 672376. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2763/console This message is automatically generated.
        Tsz Wo Nicholas Sze made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Passed tests locally. Submitting.

        Show
        Tsz Wo Nicholas Sze added a comment - Passed tests locally. Submitting.
        Tsz Wo Nicholas Sze made changes -
        Priority Major [ 3 ] Blocker [ 1 ]
        Fix Version/s 0.18.0 [ 12312972 ]
        Hide
        Tsz Wo Nicholas Sze added a comment -

        promote this to 0.18 blocker.

        Show
        Tsz Wo Nicholas Sze added a comment - promote this to 0.18 blocker.
        Tsz Wo Nicholas Sze made changes -
        Attachment 3635_20080627.patch [ 12384872 ]
        Hide
        Tsz Wo Nicholas Sze added a comment -

        3635_20080627.patch: synchronized all access of out and ignore lines in the log file with parsing errors.

        Show
        Tsz Wo Nicholas Sze added a comment - 3635_20080627.patch: synchronized all access of out and ignore lines in the log file with parsing errors.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        There are race conditions in DataBlockScanner.LogFileHandler.out. Some accesses of it are not synchronized.

        Show
        Tsz Wo Nicholas Sze added a comment - There are race conditions in DataBlockScanner.LogFileHandler.out. Some accesses of it are not synchronized.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        The log files are used for updating BlockScanInfo, so that the same block checksum won't be verified to frequently. If some lines of a log file are corrupted, these lines should be ignored.

        Show
        Tsz Wo Nicholas Sze added a comment - The log files are used for updating BlockScanInfo, so that the same block checksum won't be verified to frequently. If some lines of a log file are corrupted, these lines should be ignored.
        Robert Chansler made changes -
        Assignee Tsz Wo (Nicholas), SZE [ szetszwo ]
        Hide
        Tsz Wo Nicholas Sze added a comment -

        > Out of 2000 datanodes, 54 datanodes are in this state after 3 days of uptime.

        Do you mean that there is no datanode restart during this 3-day uptime? If it is the case, we should mark this a 0.18 blocker.

        Show
        Tsz Wo Nicholas Sze added a comment - > Out of 2000 datanodes, 54 datanodes are in this state after 3 days of uptime. Do you mean that there is no datanode restart during this 3-day uptime? If it is the case, we should mark this a 0.18 blocker.
        Hide
        Koji Noguchi added a comment -
        $ grep 121195228080 dncp_block_verification.log.curr
        date="2008-05-28 05:24:40,808"   time="121195228080date="2008-05-28 06:01:58,818"        time="1211954518818"    id="4027181978358848026"
        

        Nicholas, you're right. It is corrupted.
        Out of 2000 datanodes, 54 datanodes are in this state after 3 days of uptime.

        Show
        Koji Noguchi added a comment - $ grep 121195228080 dncp_block_verification.log.curr date="2008-05-28 05:24:40,808" time="121195228080date="2008-05-28 06:01:58,818" time="1211954518818" id="4027181978358848026" Nicholas, you're right. It is corrupted. Out of 2000 datanodes, 54 datanodes are in this state after 3 days of uptime.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        DataBlockScanner object reads log files "dncp_block_verification.log.curr" and "dncp_block_verification.log.prev". These log files contain lines like

        date="2008-06-18 11:44:38,984" time="1213814678984" genstamp="1001" id="-969938011182314584"
        

        From the error message in the description, it seems to me that the log file was corrupted. ("121195228080date=": the number looks like a timestamp but it is somehow followed by another line.) Could you check your log files? They can be found in your datanode directory .../hadoop-SYSTEM/dfs/data/current

        Show
        Tsz Wo Nicholas Sze added a comment - DataBlockScanner object reads log files "dncp_block_verification.log.curr" and "dncp_block_verification.log.prev". These log files contain lines like date="2008-06-18 11:44:38,984" time="1213814678984" genstamp="1001" id="-969938011182314584" From the error message in the description, it seems to me that the log file was corrupted. ("121195228080date=": the number looks like a timestamp but it is somehow followed by another line.) Could you check your log files? They can be found in your datanode directory .../hadoop-SYSTEM/dfs/data/current
        Koji Noguchi made changes -
        Environment 17.0 + H1979-H2159-H3442
        Koji Noguchi made changes -
        Field Original Value New Value
        Attachment jstack.H3635.txt [ 12384637 ]
        Koji Noguchi created issue -

          People

          • Assignee:
            Tsz Wo Nicholas Sze
            Reporter:
            Koji Noguchi
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development