[HDFS-11218] Add option to skip open files during HDFS Snapshots - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Workaround
Affects Version/s: 3.0.0-alpha1
Fix Version/s: 2.9.0, 3.0.0-beta1
Component/s: snapshots
Labels:
None

Description

Problem:

When there are files being written and when HDFS Snapshots are taken in parallel, Snapshots do capture all these files, but these being written files in Snapshots do not have the point-in-time file length captured.

At the time of File close or any other meta data modification operation on that file which was previously open, HDFS reconciles the file length and records the modification in the last taken Snapshot. All the previously taken Snapshots continue to have the same open File with no modification recorded. So, all those previous snapshots end up using the final modification record in the next available snapshot.

Proposal:

HDFS Snapshot Design goal was to have O(M) space usage for Snapshots, where M is the number file modifications. So, it would very expensive to record modifications for all the open files in all the snapshots. For applications that do not want to capture incomplete / partial being written binary files in the snapshots, it would be preferable to have an extra option to skip open files. This way, they don't have to worry about restoring inconsistent files from the snapshots.

hdfs dfs -createSnapshot -skipOpenFiles <snapDir> <snapName>

Attachments

Issue Links

is related to

HDFS-11402 HDFS Snapshots should capture point-in-time copies of OPEN files

Resolved

Activity

People

Assignee:: Manoj Govindassamy

Reporter:: Manoj Govindassamy

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 07/Dec/16 19:27

Updated:: 25/Aug/17 20:59

Resolved:: 25/Aug/17 20:57