-
Type:
Sub-task
-
Status: Open
-
Priority:
Minor
-
Resolution: Unresolved
-
Affects Version/s: 3.4.0
-
Fix Version/s: None
-
Component/s: fs/s3
-
Labels:None
Collection of S3A upload stats from ProgressEvent callbacks can be improved
Two similar but different implementations of listeners
- org.apache.hadoop.fs.s3a.S3ABlockOutputStream.BlockUploadProgress
- org.apache.hadoop.fs.s3a.ProgressableProgressListener. Used on simple PUT calls.
Both call back into S3A FS to incrementWriteOperations; BlockUploadProgress also updates S3AInstrumentation/IOStatistics.
- I'm not 100% confident that BlockUploadProgress is updating things (especially gauges of pending bytes) at the right time
- or that completion is being handled
- And the other interface doesn't update S3AInstrumentation; numbers are lost.
- And there's no incremental updating during CommitOperations.uploadFileToPendingCommit(), which doesn't call Progressable.progress() other than on every block.
- or in MultipartUploader
Proposed:
- a single Progress listener which updates BlockOutputStreamStatistics, used by all interfaces.
- WriteOperations to help set this up for callers;
- And it's uploadPart API to take a Progressable (or the progress listener to use for uploading that part)
- Multipart upload API to also add a progressable...would help for distcp-like applications.
+Itests to verify that the gauges come out right. At the end of each operation, the #of bytes pending upload == 0; that of bytes uploaded == the original size