Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9197

Snapshot FileDiff added to last snapshot when INodeFile accessTime field is updated

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.3.0, 2.4.0
    • None
    • snapshots
    • None

    Description

      Summary
      When a file in HDFS is read, its corresponding inode's accessTime field is updated. If the file is present in the last snapshot, the accessTime change causes a FileDiff to be added to the SnapshotDiff of the last snapshot.
      This behavior has the following problems:

      • Since FileDiff's reside in memory on the namenodes, snapshots become progressively more memory-heavy with increasing volume of data in hdfs. On a system with frequent updates, e.g. hourly, this becomes a big problem since for, say 2000 snapshots, one can have 2000 FileDiff's per file pointing to the same inode.
      • FSImage grows in size tremendously, and upload operation from standby to active namenode takes much longer.
        The generated FileDiff does not contain any useful information that I can see. Since all FileDiff's for that file are pointing to the same inode, the accessTime they see is the same.
      • I was wrong about the last point. Each FileDiff includes a SnapshotCopy attribute, which contains the updated accessTime. This may be a feature, but I'd question the value of having it enabled by default.

      Configuration:
      CDH 5.0.5 (Hadoop 2.3 / 2.4)
      We are NOT overwriting the default parameter:
      DFS_NAMENODE_ACCESSTIME_PRECISION_DEFAULT = 3600000;
      Note that it determines the allowed frequency of accessTime field updates - every hour by default.

      How to reproduce:

      [root@node1076]# hdfs dfs -ls /data/tenants/testenv.testtenant/wddata
      Found 3 items
      drwxr-xr-x   - hdfs hadoop          0 2015-10-04 10:52 /data/tenants/testenv.testtenant/wddata/folder1
      -rw-r--r--   3 hdfs hadoop         38 2015-10-05 03:13 /data/tenants/testenv.testtenant/wddata/testfile1
      -rw-r--r--   3 hdfs hadoop         21 2015-10-04 10:45 /data/tenants/testenv.testtenant/wddata/testfile2
      [root@node1076]# hdfs dfs -ls /data/tenants/testenv.testtenant/wddata/.snapshot
      Found 8 items
      drwxr-xr-x   - hdfs hadoop          0 2015-10-04 10:47 /data/tenants/testenv.testtenant/wddata/.snapshot/sn1
      drwxr-xr-x   - hdfs hadoop          0 2015-10-04 10:47 /data/tenants/testenv.testtenant/wddata/.snapshot/sn2
      drwxr-xr-x   - hdfs hadoop          0 2015-10-04 10:52 /data/tenants/testenv.testtenant/wddata/.snapshot/sn3
      drwxr-xr-x   - hdfs hadoop          0 2015-10-04 10:53 /data/tenants/testenv.testtenant/wddata/.snapshot/sn4
      drwxr-xr-x   - hdfs hadoop          0 2015-10-04 10:57 /data/tenants/testenv.testtenant/wddata/.snapshot/sn5
      drwxr-xr-x   - hdfs hadoop          0 2015-10-04 10:58 /data/tenants/testenv.testtenant/wddata/.snapshot/sn6
      drwxr-xr-x   - hdfs hadoop          0 2015-10-05 03:13 /data/tenants/testenv.testtenant/wddata/.snapshot/sn7
      drwxr-xr-x   - hdfs hadoop          0 2015-10-05 04:20 /data/tenants/testenv.testtenant/wddata/.snapshot/sn8
      [root@node1076]# hdfs dfs -createSnapshot /data/tenants/testenv.testtenant/wddata sn9
      Created snapshot /data/tenants/testenv.testtenant/wddata/.snapshot/sn9
      [root@node1076]# hdfs snapshotDiff /data/tenants/testenv.testtenant/wddata sn8 sn9
      Difference between snapshot sn8 and snapshot sn9 under directory /data/tenants/testenv.testtenant/wddata:
      
      ################
      ## IMPORTANT: testfile1 was put into HDFS more than 1 hour ago, which triggers the accessTime update.
      ################
      [root@node1076]# hdfs dfs -cat /data/tenants/testenv.testtenant/wddata/testfile1
      This is test file 1, but now it's 11.
      [root@node1076]# hdfs dfs -createSnapshot /data/tenants/testenv.testtenant/wddata sn10
      Created snapshot /data/tenants/testenv.testtenant/wddata/.snapshot/sn10
      [root@node1076]# hdfs snapshotDiff /data/tenants/testenv.testtenant/wddata sn9 sn10
      Difference between snapshot sn9 and snapshot sn10 under directory /data/tenants/testenv.testtenant/wddata:
      M	./testfile1
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            axenol Alex Ivanov

            Dates

              Created:
              Updated:

              Slack

                Issue deployment