Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.23.0
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      HDFS-903 calculates a MD5 checksum to a saved image, so that we could verify the integrity of the image at the loading time.

      The other half of the story is how to verify fsedits. Similarly we could use the checksum approach. But since a fsedit file is growing constantly, a checksum per file does not work. I am thinking to add a checksum per transaction. Is it doable or too expensive?

      1. editsChecksum2.patch
        17 kB
        Hairong Kuang
      2. editsChecksum1.patch
        17 kB
        Hairong Kuang
      3. editsChecksum.patch
        16 kB
        Hairong Kuang

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/ )
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12476020/editsChecksum2.patch
        against trunk revision 1090357.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.hdfs.TestFileCreationEmpty

        -1 contrib tests. The patch failed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/339//testReport/
        Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/339//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/339//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12476020/editsChecksum2.patch against trunk revision 1090357. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestFileCreationEmpty -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/339//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/339//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/339//console This message is automatically generated.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #587 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/587/)
        HDFS-1630. Support fsedits checksum. Contrbuted by Hairong Kuang.

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #587 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/587/ ) HDFS-1630 . Support fsedits checksum. Contrbuted by Hairong Kuang.
        Hide
        Hairong Kuang added a comment -

        The failed unit tests are not introduced by this patch.

        I've just committed this!

        Show
        Hairong Kuang added a comment - The failed unit tests are not introduced by this patch. I've just committed this!
        Hide
        Hairong Kuang added a comment -

        I updated the exception message that addresses the review comment.

        Show
        Hairong Kuang added a comment - I updated the exception message that addresses the review comment.
        Hide
        Aaron T. Myers added a comment -

        When reading old versions of fsedits that do not support checksum.

        Sorry - I didn't realize you only conditionally create the checksum. Looks good to me.

        Show
        Aaron T. Myers added a comment - When reading old versions of fsedits that do not support checksum. Sorry - I didn't realize you only conditionally create the checksum. Looks good to me.
        Hide
        Hairong Kuang added a comment -

        > Why would checksum ever be null?
        When reading old versions of fsedits that do not support checksum.

        Show
        Hairong Kuang added a comment - > Why would checksum ever be null? When reading old versions of fsedits that do not support checksum.
        Hide
        Aaron T. Myers added a comment -

        Code looks solid to me as well.

        One nit: In the case of a checksum mismatch, could you please add the expected and actual checksums which didn't match to the error message?

        Also, in two places you check for "if (checksum != null)". Why would checksum ever be null?

        Show
        Aaron T. Myers added a comment - Code looks solid to me as well. One nit: In the case of a checksum mismatch, could you please add the expected and actual checksums which didn't match to the error message? Also, in two places you check for " if (checksum != null) ". Why would checksum ever be null?
        Hide
        dhruba borthakur added a comment -

        +1, code looks good to me.

        Show
        dhruba borthakur added a comment - +1, code looks good to me.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12475826/editsChecksum1.patch
        against trunk revision 1090114.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.hdfs.server.datanode.TestBlockReport
        org.apache.hadoop.hdfs.TestDFSShell
        org.apache.hadoop.hdfs.TestFileConcurrentReader

        -1 contrib tests. The patch failed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/333//testReport/
        Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/333//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/333//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12475826/editsChecksum1.patch against trunk revision 1090114. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.datanode.TestBlockReport org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.TestFileConcurrentReader -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/333//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/333//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/333//console This message is automatically generated.
        Hide
        Hairong Kuang added a comment -

        This fixed the failed test.

        Show
        Hairong Kuang added a comment - This fixed the failed test.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12475645/editsChecksum.patch
        against trunk revision 1087900.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer

        -1 contrib tests. The patch failed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/322//testReport/
        Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/322//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/322//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12475645/editsChecksum.patch against trunk revision 1087900. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/322//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/322//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/322//console This message is automatically generated.
        Hide
        Hairong Kuang added a comment -

        Here is a patch that checksums every entry in fsedits using pure Java CRC.

        Show
        Hairong Kuang added a comment - Here is a patch that checksums every entry in fsedits using pure Java CRC.
        Hide
        Steve Loughran added a comment -

        @Hairong -yes, separate issues make sense. This one is something the 2NN needs.

        Show
        Steve Loughran added a comment - @Hairong -yes, separate issues make sense. This one is something the 2NN needs.
        Hide
        Hairong Kuang added a comment -

        Computing a crc per transaction gives more flexibility that allows to read an arbitrary transaction with seeking. This is important for the use case like avatar standby.

        Show
        Hairong Kuang added a comment - Computing a crc per transaction gives more flexibility that allows to read an arbitrary transaction with seeking. This is important for the use case like avatar standby.
        Hide
        dhruba borthakur added a comment -

        We can use PureJava CRC32. what are the pros-and-cons of computing a separate crc per transaction vs keeping a rolling crc but storing the latest accumulated crc at the end of each transaction?

        Show
        dhruba borthakur added a comment - We can use PureJava CRC32. what are the pros-and-cons of computing a separate crc per transaction vs keeping a rolling crc but storing the latest accumulated crc at the end of each transaction?
        Hide
        Tsz Wo Nicholas Sze added a comment -

        > ... I am thinking to use the pure Java CRC32 implemented in HADOOP-6148 to get the performance benefit. ...

        +1 For data integrity, CRC probably is the best.

        Show
        Tsz Wo Nicholas Sze added a comment - > ... I am thinking to use the pure Java CRC32 implemented in HADOOP-6148 to get the performance benefit. ... +1 For data integrity, CRC probably is the best.
        Hide
        Hairong Kuang added a comment -

        Thank all for your inputs. I am thinking to use the pure Java CRC32 implemented in HADOOP-6148 to get the performance benefit. Checksum will be calculated and verified per transaction level. This is because NN may get shutdown any time. So having a checksum per file level does not work well.

        @Steve, I will focus adding checksum to fsedit in this jira and then work on handling corrupt fsedits in a different jira. I think Chansler opened a jira for handling corrupt fsedits in SNN a while ago. Does this sound good to you?

        Show
        Hairong Kuang added a comment - Thank all for your inputs. I am thinking to use the pure Java CRC32 implemented in HADOOP-6148 to get the performance benefit. Checksum will be calculated and verified per transaction level. This is because NN may get shutdown any time. So having a checksum per file level does not work well. @Steve, I will focus adding checksum to fsedit in this jira and then work on handling corrupt fsedits in a different jira. I think Chansler opened a jira for handling corrupt fsedits in SNN a while ago. Does this sound good to you?
        Hide
        Steve Loughran added a comment -

        @Hairong -yes, if every TX is checksummed, that's all that is needed. But I think the remote node receiving any checksum should have the right to report the problem so the namenode can then replay it. I don't think the risk of corruption is that high, but statistics is the enemy here, eventually some NIC with built in TCP support will start playing up and your TXs get corrupted before the packet checksum is generated, and then it's no use to the recipient. If the 2ary/backup node can check on receipt and not replay, problems get found faster, and the faulting hardware more easily located

        Show
        Steve Loughran added a comment - @Hairong -yes, if every TX is checksummed, that's all that is needed. But I think the remote node receiving any checksum should have the right to report the problem so the namenode can then replay it. I don't think the risk of corruption is that high, but statistics is the enemy here, eventually some NIC with built in TCP support will start playing up and your TXs get corrupted before the packet checksum is generated, and then it's no use to the recipient. If the 2ary/backup node can check on receipt and not replay, problems get found faster, and the faulting hardware more easily located
        Hide
        Tsz Wo Nicholas Sze added a comment -

        CRC-64 may be good enough.

        Show
        Tsz Wo Nicholas Sze added a comment - CRC-64 may be good enough.
        Hide
        Aaron T. Myers added a comment -

        As Steve points out, it seems to me that MD5 is overkill if the goal is just to verify integrity.

        To address the problem of recomputing the hash of a constantly-growing file, rather than checksum each individual transaction, I suggest we use a rolling hash: http://en.wikipedia.org/wiki/Rolling_hash

        In particular, adler32 seems like a good choice: http://download.oracle.com/javase/1.5.0/docs/api/java/util/zip/Adler32.html

        Show
        Aaron T. Myers added a comment - As Steve points out, it seems to me that MD5 is overkill if the goal is just to verify integrity. To address the problem of recomputing the hash of a constantly-growing file, rather than checksum each individual transaction, I suggest we use a rolling hash: http://en.wikipedia.org/wiki/Rolling_hash In particular, adler32 seems like a good choice: http://download.oracle.com/javase/1.5.0/docs/api/java/util/zip/Adler32.html
        Hide
        Hairong Kuang added a comment -

        I did some experiments with MD5. For every transaction, a MD5 digest (16 bytes) is calculated. But to save the disk overhead, only half of the digest (8 bytes) is saved to the fssedit log. I observed only 2% overhead for directory creations and deletions in fsnamesystem for an average transaction size of 100 bytes. This seems negligible.

        > the stream to the secondary/backup node should also be checksummed to detect...
        If we have checksum for each fsedit transaction, I do not see the need to checksum the stream. The only error case that could not be detected by per transaction checksum is that the edit log gets truncated at a transaction boundary. However, SNN does validate the edit log size. So the truncation can be also detected.

        Show
        Hairong Kuang added a comment - I did some experiments with MD5. For every transaction, a MD5 digest (16 bytes) is calculated. But to save the disk overhead, only half of the digest (8 bytes) is saved to the fssedit log. I observed only 2% overhead for directory creations and deletions in fsnamesystem for an average transaction size of 100 bytes. This seems negligible. > the stream to the secondary/backup node should also be checksummed to detect... If we have checksum for each fsedit transaction, I do not see the need to checksum the stream. The only error case that could not be detected by per transaction checksum is that the edit log gets truncated at a transaction boundary. However, SNN does validate the edit log size. So the truncation can be also detected.
        Hide
        Steve Loughran added a comment -

        1. yes, checksums that are reasonably strong; do you need SHA1/MD5 or CRC32 enough?
        2. the stream to the secondary/backup node should also be checksummed to detect problems in machines, network cards, etc. What is complicated here is that the receiver needs the right to say "the edit you just sent me is corrupt, resend", both ends should log this event, etc.

        Show
        Steve Loughran added a comment - 1. yes, checksums that are reasonably strong; do you need SHA1/MD5 or CRC32 enough? 2. the stream to the secondary/backup node should also be checksummed to detect problems in machines, network cards, etc. What is complicated here is that the receiver needs the right to say "the edit you just sent me is corrupt, resend", both ends should log this event, etc.

          People

          • Assignee:
            Hairong Kuang
            Reporter:
            Hairong Kuang
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development