Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3981

Need a distributed file checksum algorithm for HDFS

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.19.0
    • None
    • None
    • Incompatible change, Reviewed
    • Hide
      Implemented MD5-of-xxxMD5-of-yyyCRC32 which is a distributed file checksum algorithm for HDFS, where xxx is the number of CRCs per block and yyy is the number of bytes per CRC.

      Changed DistCp to use file checksum for comparing files if both source and destination FileSystem(s) support getFileChecksum(...).
      Show
      Implemented MD5-of-xxxMD5-of-yyyCRC32 which is a distributed file checksum algorithm for HDFS, where xxx is the number of CRCs per block and yyy is the number of bytes per CRC. Changed DistCp to use file checksum for comparing files if both source and destination FileSystem(s) support getFileChecksum(...).

    Description

      Traditional message digest algorithms, like MD5, SHA1, etc., require reading the entire input message sequentially in a central location. HDFS supports large files with multiple tera bytes. The overhead of reading the entire file is huge. A distributed file checksum algorithm is needed for HDFS.

      Attachments

        1. 3981_20080909.patch
          22 kB
          Tsz-wo Sze
        2. 3981_20080910.patch
          29 kB
          Tsz-wo Sze
        3. 3981_20080910b.patch
          29 kB
          Tsz-wo Sze
        4. 3981_20080912.patch
          28 kB
          Tsz-wo Sze

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            szetszwo Tsz-wo Sze
            szetszwo Tsz-wo Sze
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment