Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13960

hdfs dfs -checksum command should optionally show block size in output

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.0
    • Component/s: hdfs
    • Labels:
      None

      Description

      The hdfs checksum command computes the checksum in a distributed manner, which would take into account the block size. In other words, the block size determines how the file will be broken up.

      Therefore itĀ can happen that the checksum command produces different outputs for the exact same file only differing in the block size: checksum(fileABlock1) + checksum(fileABlock2) != checksum(fileABlock1 + fileABlock2)

      I suggest to add an option to the hdfs dfs -checksum command which would displays the block sizeĀ along with the output, and that could also be helpful in some other cases where this piece of information is needed.

        Attachments

        1. HDFS-13960.001.patch
          5 kB
          Lokesh Jain
        2. HDFS-13960.002.patch
          6 kB
          Lokesh Jain
        3. HDFS-13960.003.patch
          6 kB
          Lokesh Jain

          Activity

            People

            • Assignee:
              ljain Lokesh Jain
              Reporter:
              adam.antal Adam Antal
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: