Details

    • Type: Sub-task Sub-task
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: QuorumJournalManager (HDFS-3077)
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      When the QJM passes journal segments between nodes, it should use an md5sum field to make sure the data doesn't get corrupted during transit. This also serves as an extra safe-guard to make sure that the data is consistent across all nodes when finalizing a segment.

      1. hdfs-3859-sha1.txt
        13 kB
        Andy Isaacson
      2. hdfs-3859.txt
        10 kB
        Todd Lipcon

        Issue Links

          Activity

          Hide
          Steve Loughran added a comment -

          Isn't MD5 overkill? Can't a good CRC (like TCP Jumbo Frames uses) suffice?

          Show
          Steve Loughran added a comment - Isn't MD5 overkill? Can't a good CRC (like TCP Jumbo Frames uses) suffice?
          Hide
          Todd Lipcon added a comment -

          Sure, it's overkill, but it's not that expensive and we already have an implementation of it sitting around. It's also handy because "md5sum" is commonly available on the command line, and we use it for FSImages already as well. Performance-wise, my laptop can md5sum at about 500MB/sec, so given that log segments under recovery are likely to be much smaller than 500M, I don't think we should be concerned about that.

          Show
          Todd Lipcon added a comment - Sure, it's overkill, but it's not that expensive and we already have an implementation of it sitting around. It's also handy because "md5sum" is commonly available on the command line, and we use it for FSImages already as well. Performance-wise, my laptop can md5sum at about 500MB/sec, so given that log segments under recovery are likely to be much smaller than 500M, I don't think we should be concerned about that.
          Hide
          Andy Isaacson added a comment -

          Please consider using SHA1 rather than MD5. The performance should be comparable (SHA1 is about 2.5% faster in my quick test, but that's "equal" by any rational measure). The hash is much less awfully broken. And it's one fewer place where we'll need to continue supporting legacy insecure code in the future.

          Show
          Andy Isaacson added a comment - Please consider using SHA1 rather than MD5. The performance should be comparable (SHA1 is about 2.5% faster in my quick test, but that's "equal" by any rational measure). The hash is much less awfully broken. And it's one fewer place where we'll need to continue supporting legacy insecure code in the future.
          Hide
          Todd Lipcon added a comment -

          We don't have existing SHA1 implementations, and this isn't about security. It's just to guard against bugs or in-flight corruption. Security is taken care of by other layers (eg SPNEGO on the image transfer). I dont want to add new code and switch hashes for no good reason.

          Show
          Todd Lipcon added a comment - We don't have existing SHA1 implementations, and this isn't about security. It's just to guard against bugs or in-flight corruption. Security is taken care of by other layers (eg SPNEGO on the image transfer). I dont want to add new code and switch hashes for no good reason.
          Hide
          Andy Isaacson added a comment -

          Attaching SHA1 implementation.

          Show
          Andy Isaacson added a comment - Attaching SHA1 implementation.
          Hide
          Todd Lipcon added a comment -

          I get it that it can be done. But, why? This is not a cryptographic use case. There's no sense adding new code.

          Show
          Todd Lipcon added a comment - I get it that it can be done. But, why? This is not a cryptographic use case. There's no sense adding new code.
          Hide
          Andy Isaacson added a comment -

          Bunch of touchy feely reasons:

          1. mere presence of MD5 in the code is a red flag to auditors. We then have to spend time arguing with people over whether it's a security issue.
          2. think of broken crypto protocols as similar to asbestos. It's perfectly possible to create safe products that include asbestos (and in some cases they're much cheaper and better performing than a non-asbestos-containing substitute). But nobody does this because the stigma is too great, and too many times a vendor's statement "my product is safe" turns out to be false, so it's easier to just move on.
          3. building new code that uses old broken primitives just extends the time before we can delete the implementations of the old broken primitives. Better to move forward when the opportunity exists.
          Show
          Andy Isaacson added a comment - Bunch of touchy feely reasons: mere presence of MD5 in the code is a red flag to auditors. We then have to spend time arguing with people over whether it's a security issue. think of broken crypto protocols as similar to asbestos. It's perfectly possible to create safe products that include asbestos (and in some cases they're much cheaper and better performing than a non-asbestos-containing substitute). But nobody does this because the stigma is too great, and too many times a vendor's statement "my product is safe" turns out to be false, so it's easier to just move on. building new code that uses old broken primitives just extends the time before we can delete the implementations of the old broken primitives. Better to move forward when the opportunity exists.
          Hide
          Andy Isaacson added a comment -

          I've opened HDFS-3868 as a placeholder for the SHA1 implementation, so that the next time someone says "we don't have an existing implementation" we have somewhere to go.

          Show
          Andy Isaacson added a comment - I've opened HDFS-3868 as a placeholder for the SHA1 implementation, so that the next time someone says "we don't have an existing implementation" we have somewhere to go.
          Hide
          Steve Loughran added a comment -

          @Todd : this is why a CRC check would be simpler. Faster and less controversial.

          Show
          Steve Loughran added a comment - @Todd : this is why a CRC check would be simpler. Faster and less controversial.
          Hide
          Todd Lipcon added a comment -

          Attached patch fills in the md5sum of the file during recovery, and verifies it when the log is transfered via HTTP.

          Show
          Todd Lipcon added a comment - Attached patch fills in the md5sum of the file during recovery, and verifies it when the log is transfered via HTTP.
          Hide
          Aaron T. Myers added a comment -

          +1, the patch looks good to me. Good work, Todd.

          Show
          Aaron T. Myers added a comment - +1, the patch looks good to me. Good work, Todd.
          Hide
          Todd Lipcon added a comment -

          Actually this isn't committable yet. Before committing I ran it through the randomized fault tests and found some issues – mostly around the fact that two log segments may be semantically equivalent but their md5sums may differ (due to the padding at the end of the file after the transactions). It's looking like working around this is going to be somewhat difficult... so I may just remove the md5sum field for now until we have some better way of handling this.

          Show
          Todd Lipcon added a comment - Actually this isn't committable yet. Before committing I ran it through the randomized fault tests and found some issues – mostly around the fact that two log segments may be semantically equivalent but their md5sums may differ (due to the padding at the end of the file after the transactions). It's looking like working around this is going to be somewhat difficult... so I may just remove the md5sum field for now until we have some better way of handling this.
          Hide
          Todd Lipcon added a comment -

          Filed HDFS-3943 to temporarily remove the md5sum field. I'll continue to work on this JIRA over the next few weeks, but in the meantime the md5sum code is just cruft in the logs.

          Show
          Todd Lipcon added a comment - Filed HDFS-3943 to temporarily remove the md5sum field. I'll continue to work on this JIRA over the next few weeks, but in the meantime the md5sum code is just cruft in the logs.

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:

                Development