Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Currently FileSystem.Statistics exposes the following statistics:
BytesRead
BytesWritten
ReadOps
LargeReadOps
WriteOps
These are in-turn exposed as job counters by MapReduce and other frameworks. There is logic within DfsClient to map operations to these counters that can be confusing, for instance, mkdirs counts as a writeOp.
Proposed enhancement:
Add a statistic for each DfsClient operation including create, append, createSymlink, delete, exists, mkdirs, rename and expose them as new properties on the Statistics object. The operation-specific counters can be used for analyzing the load imposed by a particular job on HDFS.
For example, we can use them to identify jobs that end up creating a large number of files.
Once this information is available in the Statistics object, the app frameworks like MapReduce can expose them as additional counters to be aggregated and recorded as part of job summary.
Attachments
Attachments
Issue Links
- breaks
-
HDFS-10418 NPE in TestDistributedFileSystem.testDFSCloseOrdering
- Resolved
- relates to
-
HADOOP-13028 add low level counter metrics for S3A; use in read performance tests
- Resolved
-
HADOOP-13171 Add StorageStatistics to S3A; instrument some more operations
- Resolved
-
HADOOP-15124 Slow FileSystem.Statistics counters implementation
- Patch Available
-
TEZ-3331 Add operation specific HDFS counters for Tez UI
- Patch Available