Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.14.0
    • Component/s: Spark
    • Labels:
      None

      Description

      It was brought up on the mailing list that our Crunch counters are not showing up on the Spark webui possibly because they are not named.

      We are currently testing a few capabilities using Spark and one thing we noticed in Spark is they don't list any user defined accumulators on web UI.

      On MapReduce I would imagine counters being displayed on the job page, however on a SparkPipeline I was only able to pull counter information from PipelineResult#getStageResult().

      I think the reason these accumulators are not visible on web UI is because crunch does not name these accumulators. Spark expects an accumulator to have a name to be visible on the UI.

      https://github.com/apache/crunch/blob/apache-crunch-0.13.0/crunch-spark/src/main/java/org/apache/crunch/impl/spark/SparkRuntime.java#L125-L126

      https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala#L616-L624 (accumulator API with Name)

      I would like to know if it's possible in crunch to name these accumulators so they are available in web UI. This will give us an experience where users can monitor/watch accumulators from web UI to obtain key information about their jobs.

        Attachments

        1. CRUNCH-558.patch
          1 kB
          Micah Whitacre

          Activity

            People

            • Assignee:
              mkwhitacre Micah Whitacre
              Reporter:
              mkwhitacre Micah Whitacre
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: