Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14514

Actual read size of open file in encryption zone still larger than listing size even after enabling HDFS-11402 in Hadoop 2

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.6.5, 2.9.2, 2.8.5, 2.7.7
    • 2.10.0, 2.9.3
    • encryption, hdfs, snapshots
    • None

    Description

      In Hadoop 2, when a file is opened for write in encryption zone, taken a snapshot and appended, the read out file size in the snapshot is larger than the listing size. This happens even when immutable snapshot HDFS-11402 is enabled.

      Note: The refactor HDFS-8905 happened in Hadoop 3.0 and later fixed the bug silently (probably incidentally). Hadoop 2.x are still suffering from this issue.

      Thanks Stephen O'Donnell for locating the root cause in the codebase.

      Repro:
      1. Set dfs.namenode.snapshot.capture.openfiles to true in hdfs-site.xml, start HDFS cluster
      2. Create an empty directory /dataenc, create encryption zone and allow snapshot on it

      hadoop key create reprokey
      sudo -u hdfs hdfs dfs -mkdir /dataenc
      sudo -u hdfs hdfs crypto -createZone -keyName reprokey -path /dataenc
      sudo -u hdfs hdfs dfsadmin -allowSnapshot /dataenc
      

      3. Use a client that keeps a file open for write under /dataenc. For example, I'm using Flume HDFS sink to tail a local file.
      4. Append the file several times using the client, keep the file open.
      5. Create a snapshot

      sudo -u hdfs hdfs dfs -createSnapshot /dataenc snap1
      

      6. Append the file one or more times, but don't let the file size exceed the block size limit. Wait for several seconds for the append to be flushed to DN.
      7. Do a -ls on the file inside the snapshot, then try to read the file using -get, you should see the actual file size read is larger than the listing size from -ls.

      The patch and an updated unit test will be uploaded later.

      Attachments

        1. HDFS-14514.branch-2.004.patch
          5 kB
          Wei-Chiu Chuang
        2. HDFS-14514.branch-2.003.patch
          4 kB
          Stephen O'Donnell
        3. HDFS-14514.branch-2.002.patch
          3 kB
          Stephen O'Donnell
        4. HDFS-14514.branch-2.001.patch
          3 kB
          Siyao Meng

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            smeng Siyao Meng
            smeng Siyao Meng
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment