Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13056

Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts

    XMLWordPrintableJSON

Details

    • Reviewed
    • Hide
      <!-- markdown -->

      This feature adds a new `COMPOSITE_CRC` FileChecksum type which uses CRC composition to remain completely chunk/block agnostic, and allows comparison between striped vs replicated files, between different HDFS instances, and even between HDFS and other external storage systems or local files.

      Without this option, HDFS FileChecksums are historically computed as "MD5 of MD5 of CRC" across chunks and blocks, so that the checksums are dependent on the configured chunk and block sizes, rather than only reflecting the logical file contents. The feature works by changing the way individual chunk and block checksums are combined at invocation time, so doesn't depend on or affect any persistent metadata; it can be used in-place in existing HDFS deployments without any recomputation of block metadata.

      This option can be enabled or disabled at the granularity of individual client calls by setting the new configuration option `dfs.checksum.combine.mode` to `COMPOSITE_CRC`:

          hdfs dfs -Ddfs.checksum.combine.mode=COMPOSITE_CRC -checksum hdfs:///tmp/foo.txt

      By default, this configuration option is set to `MD5MD5CRC`, which preserves the fully backwards-compatible existing behavior of HDFS.

      Note that this "combine mode" is orthogonal to the existing `dfs.checksum.type` configuration option, which configures the checksum algorithm at the level of individual chunk crcs that are stored in block metadata, where `CRC32` specifies the legacy "gzip" CRC polynomial, and `CRC32C` specifies the "Castagnoli" polynomial. The composite CRC mode will adhere to the checksum type stored in the block metadata; if an HDFS instance stores a mixture of files with `CRC32` and `CRC32C` types, the list of FileChecksums with `COMPOSITE_CRC` will likewise produce the corresponding mixture of `COMPOSITE-CRC32` and `COMPOSITE-CRC32C` types.
      Show
      <!-- markdown --> This feature adds a new `COMPOSITE_CRC` FileChecksum type which uses CRC composition to remain completely chunk/block agnostic, and allows comparison between striped vs replicated files, between different HDFS instances, and even between HDFS and other external storage systems or local files. Without this option, HDFS FileChecksums are historically computed as "MD5 of MD5 of CRC" across chunks and blocks, so that the checksums are dependent on the configured chunk and block sizes, rather than only reflecting the logical file contents. The feature works by changing the way individual chunk and block checksums are combined at invocation time, so doesn't depend on or affect any persistent metadata; it can be used in-place in existing HDFS deployments without any recomputation of block metadata. This option can be enabled or disabled at the granularity of individual client calls by setting the new configuration option `dfs.checksum.combine.mode` to `COMPOSITE_CRC`:     hdfs dfs -Ddfs.checksum.combine.mode=COMPOSITE_CRC -checksum hdfs:///tmp/foo.txt By default, this configuration option is set to `MD5MD5CRC`, which preserves the fully backwards-compatible existing behavior of HDFS. Note that this "combine mode" is orthogonal to the existing `dfs.checksum.type` configuration option, which configures the checksum algorithm at the level of individual chunk crcs that are stored in block metadata, where `CRC32` specifies the legacy "gzip" CRC polynomial, and `CRC32C` specifies the "Castagnoli" polynomial. The composite CRC mode will adhere to the checksum type stored in the block metadata; if an HDFS instance stores a mixture of files with `CRC32` and `CRC32C` types, the list of FileChecksums with `COMPOSITE_CRC` will likewise produce the corresponding mixture of `COMPOSITE-CRC32` and `COMPOSITE-CRC32C` types.

    Description

      FileChecksum was first introduced in https://issues-test.apache.org/jira/browse/HADOOP-3981 and ever since then has remained defined as MD5-of-MD5-of-CRC, where per-512-byte chunk CRCs are already stored as part of datanode metadata, and the MD5 approach is used to compute an aggregate value in a distributed manner, with individual datanodes computing the MD5-of-CRCs per-block in parallel, and the HDFS client computing the second-level MD5.

       

      A shortcoming of this approach which is often brought up is the fact that this FileChecksum is sensitive to the internal block-size and chunk-size configuration, and thus different HDFS files with different block/chunk settings cannot be compared. More commonly, one might have different HDFS clusters which use different block sizes, in which case any data migration won't be able to use the FileChecksum for distcp's rsync functionality or for verifying end-to-end data integrity (on top of low-level data integrity checks applied at data transfer time).

       

      This was also revisited in https://issues.apache.org/jira/browse/HDFS-8430 during the addition of checksum support for striped erasure-coded files; while there was some discussion of using CRC composability, it still ultimately settled on hierarchical MD5 approach, which also adds the problem that checksums of basic replicated files are not comparable to striped files.

       

      This feature proposes to add a "COMPOSITE-CRC" FileChecksum type which uses CRC composition to remain completely chunk/block agnostic, and allows comparison between striped vs replicated files, between different HDFS instances, and possible even between HDFS and other external storage systems. This feature can also be added in-place to be compatible with existing block metadata, and doesn't need to change the normal path of chunk verification, so is minimally invasive. This also means even large preexisting HDFS deployments could adopt this feature to retroactively sync data. A detailed design document can be found here: https://storage.googleapis.com/dennishuo/hdfs-file-composite-crc32-v1.pdf

      Attachments

        1. HDFS-13056.001.patch
          80 kB
          Dennis Huo
        2. HDFS-13056.002.patch
          123 kB
          Dennis Huo
        3. HDFS-13056.003.patch
          121 kB
          Dennis Huo
        4. HDFS-13056.003.patch
          121 kB
          Dennis Huo
        5. HDFS-13056.004.patch
          122 kB
          Dennis Huo
        6. HDFS-13056.005.patch
          123 kB
          Dennis Huo
        7. HDFS-13056.006.patch
          123 kB
          Dennis Huo
        8. HDFS-13056.007.patch
          123 kB
          Dennis Huo
        9. HDFS-13056.008.patch
          128 kB
          Dennis Huo
        10. HDFS-13056.009.patch
          129 kB
          Dennis Huo
        11. HDFS-13056.010.patch
          142 kB
          Dennis Huo
        12. HDFS-13056.011.patch
          142 kB
          Dennis Huo
        13. HDFS-13056.012.patch
          145 kB
          Dennis Huo
        14. HDFS-13056.013.patch
          145 kB
          Dennis Huo
        15. HDFS-13056.014.patch
          148 kB
          Dennis Huo
        16. HDFS-13056-branch-2.8.001.patch
          38 kB
          Dennis Huo
        17. HDFS-13056-branch-2.8.002.patch
          81 kB
          Dennis Huo
        18. HDFS-13056-branch-2.8.003.patch
          81 kB
          Dennis Huo
        19. HDFS-13056-branch-2.8.004.patch
          81 kB
          Dennis Huo
        20. HDFS-13056-branch-2.8.005.patch
          82 kB
          Dennis Huo
        21. HDFS-13056-branch-2.8.poc1.patch
          30 kB
          Dennis Huo
        22. hdfs-file-composite-crc32-v1.pdf
          197 kB
          Dennis Huo
        23. hdfs-file-composite-crc32-v2.pdf
          198 kB
          Dennis Huo
        24. hdfs-file-composite-crc32-v3.pdf
          219 kB
          Dennis Huo
        25. Reference_only_zhen_PPOC_hadoop2.6.X.diff
          76 kB
          zhenzhao wang

        Issue Links

          Activity

            People

              dennishuo Dennis Huo
              dennishuo Dennis Huo
              Votes:
              0 Vote for this issue
              Watchers:
              31 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: