Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-12326

Implement ChecksumFileSystem#getFileChecksum equivalent to HDFS for easy check

    Details

    • Type: Improvement
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.7.1
    • Fix Version/s: None
    • Component/s: fs
    • Labels:
      None
    • Target Version/s:

      Description

      If we have same-content files, one local and one remotely on HDFS (after downloading or uploading), getFileChecksum can provide a quick check whether they are consistent. To this end, we can switch to CRC32C on local filesystem. The difference in block sizes does not matter, because for the local filesystem it's just a logical parameter.

      $ hadoop fs -Dfs.local.block.size=134217728 -checksum file:${PWD}/part-m-00000 part-m-00000
      15/08/15 13:30:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      file:///Users/gshegalov/workspace/hadoop-common/part-m-00000	MD5-of-262144MD5-of-512CRC32C	000002000000000000040000e84fb07f8c9d4ef3acb5d1983a7e2a68
      part-m-00000	MD5-of-262144MD5-of-512CRC32C	000002000000000000040000e84fb07f8c9d4ef3acb5d1983a7e2a68
      

        Attachments

        1. HADOOP-12326.007.patch
          27 kB
          Gera Shegalov
        2. HADOOP-12326.005.patch
          24 kB
          Gera Shegalov
        3. HADOOP-12326.004.patch
          18 kB
          Gera Shegalov
        4. HADOOP-12326.003.patch
          18 kB
          Gera Shegalov
        5. HADOOP-12326.002.patch
          13 kB
          Gera Shegalov
        6. HADOOP-12326.001.patch
          11 kB
          Gera Shegalov

          Activity

            People

            • Assignee:
              jira.shegalov Gera Shegalov
              Reporter:
              jira.shegalov Gera Shegalov
            • Votes:
              1 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated: