Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-558

Add name to Spark Accumulators

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.14.0
    • Spark
    • None

    Description

      It was brought up on the mailing list that our Crunch counters are not showing up on the Spark webui possibly because they are not named.

      We are currently testing a few capabilities using Spark and one thing we noticed in Spark is they don't list any user defined accumulators on web UI.

      On MapReduce I would imagine counters being displayed on the job page, however on a SparkPipeline I was only able to pull counter information from PipelineResult#getStageResult().

      I think the reason these accumulators are not visible on web UI is because crunch does not name these accumulators. Spark expects an accumulator to have a name to be visible on the UI.

      https://github.com/apache/crunch/blob/apache-crunch-0.13.0/crunch-spark/src/main/java/org/apache/crunch/impl/spark/SparkRuntime.java#L125-L126

      https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala#L616-L624 (accumulator API with Name)

      I would like to know if it's possible in crunch to name these accumulators so they are available in web UI. This will give us an experience where users can monitor/watch accumulators from web UI to obtain key information about their jobs.

      Attachments

        1. CRUNCH-558.patch
          1 kB
          Micah Whitacre

        Activity

          People

            mkwhitacre Micah Whitacre
            mkwhitacre Micah Whitacre
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: