Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Done
-
None
-
None
Description
The engine should collect job execution statistics (e.g., via accumulators) such as:
- total number of input / output records per operator
- histogram of input/output ratio of UDF calls
- histogram of number of input records per reduce / cogroup UDF call
- histogram of number of output records per UDF call
- histogram of time spend in UDF calls
- number of local and remote bytes read (not via accumulators)
- ...
These stats should be made available to the user after execution (via webfrontend). The purpose of this feature is to ease performance debugging of parallel jobs (e.g., to detect data skew).
It should be possible to deactivate (or activate) the gathering of these statistics.
---------------- Imported from GitHub ----------------
Url: https://github.com/stratosphere/stratosphere/issues/456
Created by: fhueske
Labels: enhancement, runtime, user satisfaction,
Created at: Tue Feb 04 20:32:49 CET 2014
State: open
Attachments
Issue Links
- relates to
-
FLINK-1297 Add support for tracking statistics of intermediate results
- Resolved