Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3981

Need a distributed file checksum algorithm for HDFS

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.19.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Hide
      Implemented MD5-of-xxxMD5-of-yyyCRC32 which is a distributed file checksum algorithm for HDFS, where xxx is the number of CRCs per block and yyy is the number of bytes per CRC.

      Changed DistCp to use file checksum for comparing files if both source and destination FileSystem(s) support getFileChecksum(...).
      Show
      Implemented MD5-of-xxxMD5-of-yyyCRC32 which is a distributed file checksum algorithm for HDFS, where xxx is the number of CRCs per block and yyy is the number of bytes per CRC. Changed DistCp to use file checksum for comparing files if both source and destination FileSystem(s) support getFileChecksum(...).

      Description

      Traditional message digest algorithms, like MD5, SHA1, etc., require reading the entire input message sequentially in a central location. HDFS supports large files with multiple tera bytes. The overhead of reading the entire file is huge. A distributed file checksum algorithm is needed for HDFS.

        Attachments

        1. 3981_20080909.patch
          22 kB
          Tsz Wo Nicholas Sze
        2. 3981_20080910.patch
          29 kB
          Tsz Wo Nicholas Sze
        3. 3981_20080910b.patch
          29 kB
          Tsz Wo Nicholas Sze
        4. 3981_20080912.patch
          28 kB
          Tsz Wo Nicholas Sze

          Issue Links

            Activity

              People

              • Assignee:
                szetszwo Tsz Wo Nicholas Sze
                Reporter:
                szetszwo Tsz Wo Nicholas Sze
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: