Uploaded image for project: 'Ratis'
  1. Ratis
  2. RATIS-2147

MD5 mismatch when accept snapshot

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.1.1
    • snapshot
    • None

    Description

      We encountered an MD5 mismatch issue in IoTDB, and after multiple investigations, we found that the digester was contaminated
       
      We have checked that it is not a network and disk problem
       
      In implementation, the received snapshot will be written to a temporary file first. If there is an md5 mismatch, we will read the data from this temporary file and use a new digest to calculate md5, but the result of this calculation is the same as the md5 hash value sent

       

       
       
      Use the saved corrupted file name to locate the relevant log, here to tlog.txt.snapshot.snapshot.as an example corrupt20240831-094107 _735

      Before encountering corrupt, the sender sent several consecutive snapshot installation requests to the receiver.
       
      The receiver successfully received some requests, and then encountered a request for corrupt, and began printing "recompute again" to start recalculating.
       
      After execution, the ERROR log of the rename will be printed, and the data will be read from the file and compared with the received chunk data.
       
      If a byte does not match, the corresponding information will be printed, but no log information will be printed, which means that the content written to the disk is the same as the content sent

      This makes the problem very clear. There is a problem with the MD5 calculation class, and the reasons are as follows:
       
           If a byte in the middle of the data part is incorrect due to network reasons, the calculated result and the hash sent must be different
       
          If there is a problem with the part that stores the hash value, the final calculation result will also be different.

       
      I suggest creating a new digest every time follower receive a snapshot, so as to avoid pollution problems. Under normal network and disk conditions, Corrupt will not occur

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tohsakarin__ yuuka
            tohsakarin__ yuuka
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 5h 20m
                5h 20m

                Slack

                  Issue deployment