[HADOOP-3941] Extend FileSystem API to return file-checksums/file-digests - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.19.0
Component/s: fs
Labels:
None

Hadoop Flags:

Reviewed
Release Note:
Added new FileSystem APIs: FileChecksum and FileSystem.getFileChecksum(Path).

Description

Suppose we have two files in two locations (may be two clusters) and these two files have the same size. How could we tell whether the content of them are the same?

Currently, the only way is to read both files and compare the content of them. This is a very expensive operation if the files are huge.

So, we would like to extend the FileSystem API to support returning file-checksums/file-digests.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

3941_20080818.patch
18/Aug/08 20:32
10 kB
Tsz-wo Sze
3941_20080819.patch
19/Aug/08 18:54
15 kB
Tsz-wo Sze
3941_20080819b.patch
19/Aug/08 22:36
18 kB
Tsz-wo Sze
3941_20080820.patch
20/Aug/08 18:50
18 kB
Tsz-wo Sze
3941_20080826.patch
26/Aug/08 21:12
14 kB
Tsz-wo Sze
3941_20080827.patch
27/Aug/08 18:25
17 kB
Tsz-wo Sze
3941_20080904.patch
05/Sep/08 01:10
10 kB
Tsz-wo Sze

Issue Links

blocks

HADOOP-3981 Need a distributed file checksum algorithm for HDFS

Closed

Activity

People

Assignee:: Tsz-wo Sze

Reporter:: Tsz-wo Sze

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 12/Aug/08 23:05

Updated:: 20/Nov/08 23:38

Resolved:: 05/Sep/08 22:53