Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46294

Clean up initValue vs zeroValue semantics in SQLMetrics

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.5.0
    • 4.0.0
    • SQL

    Description

      The semantics of initValue and _zeroValue in SQLMetrics is a little bit confusing, since they effectively mean the same thing. Changing it to the following would be clearer, especially in terms of defining what an "invalid" metric is.
       
      proposed definitions:
       
      initValue is the starting value for a SQLMetric. If a metric has value equal to its initValue, then it should be filtered out before aggregating with SQLMetrics.stringValue().
       
      zeroValue defines the lowest value considered valid. If a SQLMetric is invalid, it is set to zeroValue upon receiving any updates, and it also reports zeroValue as its value to avoid exposing it to the user programatically (concern previouosly addressed in SPARK-41442).

      For many SQLMetrics, we use initValue = -1 and zeroValue = 0 to indicate that the metric is by default invalid. At the end of a task, we will update the metric making it valid, and the invalid metrics will be filtered out when calculating min, max, etc. as a workaround for SPARK-11013.

      Attachments

        Issue Links

          Activity

            People

              dtjong Davin Tjong
              dtjong Davin Tjong
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: