Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
3.5.0
Description
The semantics of initValue and _zeroValue in SQLMetrics is a little bit confusing, since they effectively mean the same thing. Changing it to the following would be clearer, especially in terms of defining what an "invalid" metric is.
proposed definitions:
initValue is the starting value for a SQLMetric. If a metric has value equal to its initValue, then it should be filtered out before aggregating with SQLMetrics.stringValue().
zeroValue defines the lowest value considered valid. If a SQLMetric is invalid, it is set to zeroValue upon receiving any updates, and it also reports zeroValue as its value to avoid exposing it to the user programatically (concern previouosly addressed in SPARK-41442).
For many SQLMetrics, we use initValue = -1 and zeroValue = 0 to indicate that the metric is by default invalid. At the end of a task, we will update the metric making it valid, and the invalid metrics will be filtered out when calculating min, max, etc. as a workaround for SPARK-11013.
Attachments
Issue Links
- links to