Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3635

Uncaught exception in DataBlockScanner

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.17.0
    • 0.18.0
    • None
    • None
    • 17.0 + H1979-H2159-H3442

    • Reviewed

    Description

      I see bunch of datanodes stop verifying local blocks.

      ".out" showed

      -rw-r--r--  1 hdfs users 614 Jun 23 10:24 datanode.out
      
      Exception in thread "org.apache.hadoop.dfs.DataBlockScanner@aadc97" java.lang.NumberFormatException: For input string: "121195228080date="
              at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
              at java.lang.Long.parseLong(Long.java:412)
              at java.lang.Long.valueOf(Long.java:518)
              at org.apache.hadoop.dfs.DataBlockScanner$LogEntry.parseEntry(DataBlockScanner.java:351)
              at org.apache.hadoop.dfs.DataBlockScanner.assignInitialVerificationTimes(DataBlockScanner.java:481)
              at org.apache.hadoop.dfs.DataBlockScanner.run(DataBlockScanner.java:534)
              at java.lang.Thread.run(Thread.java:619)
      

      Namenode log also showed

      2008-06-23 10:24:12,831 WARN org.apache.hadoop.dfs.DataBlockScanner: RuntimeException during DataBlockScanner.run() : java.lang.NumberFormatException: For input string: "121195228080date="
              at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
              at java.lang.Long.parseLong(Long.java:412)
              at java.lang.Long.valueOf(Long.java:518)
              at org.apache.hadoop.dfs.DataBlockScanner$LogEntry.parseEntry(DataBlockScanner.java:351)
              at org.apache.hadoop.dfs.DataBlockScanner.assignInitialVerificationTimes(DataBlockScanner.java:481)
              at org.apache.hadoop.dfs.DataBlockScanner.run(DataBlockScanner.java:534)
              at java.lang.Thread.run(Thread.java:619)
      

      Datanode was still up and running but no verification.
      Jstack didn't show DataBlockScanner.

      Attachments

        1. 3635_20080627.patch
          6 kB
          Tsz-wo Sze
        2. 3635_20080630.patch
          6 kB
          Tsz-wo Sze
        3. jstack.H3635.txt
          10 kB
          Koji Noguchi

        Activity

          szetszwo Tsz-wo Sze added a comment -

          DataBlockScanner object reads log files "dncp_block_verification.log.curr" and "dncp_block_verification.log.prev". These log files contain lines like

          date="2008-06-18 11:44:38,984" time="1213814678984" genstamp="1001" id="-969938011182314584"
          

          From the error message in the description, it seems to me that the log file was corrupted. ("121195228080date=": the number looks like a timestamp but it is somehow followed by another line.) Could you check your log files? They can be found in your datanode directory .../hadoop-SYSTEM/dfs/data/current

          szetszwo Tsz-wo Sze added a comment - DataBlockScanner object reads log files "dncp_block_verification.log.curr" and "dncp_block_verification.log.prev". These log files contain lines like date="2008-06-18 11:44:38,984" time="1213814678984" genstamp="1001" id="-969938011182314584" From the error message in the description, it seems to me that the log file was corrupted. ("121195228080date=": the number looks like a timestamp but it is somehow followed by another line.) Could you check your log files? They can be found in your datanode directory .../hadoop-SYSTEM/dfs/data/current
          knoguchi Koji Noguchi added a comment -
          $ grep 121195228080 dncp_block_verification.log.curr
          date="2008-05-28 05:24:40,808"   time="121195228080date="2008-05-28 06:01:58,818"        time="1211954518818"    id="4027181978358848026"
          

          Nicholas, you're right. It is corrupted.
          Out of 2000 datanodes, 54 datanodes are in this state after 3 days of uptime.

          knoguchi Koji Noguchi added a comment - $ grep 121195228080 dncp_block_verification.log.curr date="2008-05-28 05:24:40,808" time="121195228080date="2008-05-28 06:01:58,818" time="1211954518818" id="4027181978358848026" Nicholas, you're right. It is corrupted. Out of 2000 datanodes, 54 datanodes are in this state after 3 days of uptime.
          szetszwo Tsz-wo Sze added a comment -

          > Out of 2000 datanodes, 54 datanodes are in this state after 3 days of uptime.

          Do you mean that there is no datanode restart during this 3-day uptime? If it is the case, we should mark this a 0.18 blocker.

          szetszwo Tsz-wo Sze added a comment - > Out of 2000 datanodes, 54 datanodes are in this state after 3 days of uptime. Do you mean that there is no datanode restart during this 3-day uptime? If it is the case, we should mark this a 0.18 blocker.
          szetszwo Tsz-wo Sze added a comment -

          The log files are used for updating BlockScanInfo, so that the same block checksum won't be verified to frequently. If some lines of a log file are corrupted, these lines should be ignored.

          szetszwo Tsz-wo Sze added a comment - The log files are used for updating BlockScanInfo, so that the same block checksum won't be verified to frequently. If some lines of a log file are corrupted, these lines should be ignored.
          szetszwo Tsz-wo Sze added a comment -

          There are race conditions in DataBlockScanner.LogFileHandler.out. Some accesses of it are not synchronized.

          szetszwo Tsz-wo Sze added a comment - There are race conditions in DataBlockScanner.LogFileHandler.out. Some accesses of it are not synchronized.
          szetszwo Tsz-wo Sze added a comment -

          3635_20080627.patch: synchronized all access of out and ignore lines in the log file with parsing errors.

          szetszwo Tsz-wo Sze added a comment - 3635_20080627.patch: synchronized all access of out and ignore lines in the log file with parsing errors.
          szetszwo Tsz-wo Sze added a comment -

          promote this to 0.18 blocker.

          szetszwo Tsz-wo Sze added a comment - promote this to 0.18 blocker.
          szetszwo Tsz-wo Sze added a comment -

          Passed tests locally. Submitting.

          szetszwo Tsz-wo Sze added a comment - Passed tests locally. Submitting.
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12384872/3635_20080627.patch
          against trunk revision 672376.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2763/console

          This message is automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12384872/3635_20080627.patch against trunk revision 672376. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2763/console This message is automatically generated.
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12384872/3635_20080627.patch
          against trunk revision 672376.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/console

          This message is automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12384872/3635_20080627.patch against trunk revision 672376. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2768/console This message is automatically generated.
          hairong Hairong Kuang added a comment -

          I think flush is not needed after writing each line in the log. Otherwise the patch looks good.

          hairong Hairong Kuang added a comment - I think flush is not needed after writing each line in the log. Otherwise the patch looks good.
          szetszwo Tsz-wo Sze added a comment -

          3635_20080630.patch: removed flush().

          szetszwo Tsz-wo Sze added a comment - 3635_20080630.patch: removed flush().
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12384973/3635_20080630.patch
          against trunk revision 672848.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/console

          This message is automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12384973/3635_20080630.patch against trunk revision 672848. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2773/console This message is automatically generated.
          hairong Hairong Kuang added a comment -

          I've just committed this. Thanks, Nicholas!

          hairong Hairong Kuang added a comment - I've just committed this. Thanks, Nicholas!
          hudson Hudson added a comment -
          hudson Hudson added a comment - Integrated in Hadoop-trunk #535 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/535/ )

          People

            szetszwo Tsz-wo Sze
            knoguchi Koji Noguchi
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: