Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-975

Spark Replay Debugger

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 0.9.0
    • None
    • Spark Core

    Description

      The Spark debugger was first mentioned as rddbg in the RDD technical report.

      Arthur, authored by Ankur Dave, is an old implementation of the Spark debugger, which demonstrated both the elegance and power behind the RDD abstraction. Unfortunately, the corresponding GitHub branch was not merged into the master branch and had stopped 2 years ago. For more information about Arthur, please refer to the Spark Debugger Wiki page in the old GitHub repository.

      As a useful tool for Spark application debugging and analysis, it would be nice to have a complete Spark debugger. In PR-224, I propose a new implementation of the Spark debugger, the Spark Replay Debugger (SRD).

      PR-224 is only a preview for discussion. In the current version, I only implemented features that can illustrate the basic mechanisms. There are still features appeared in Arthur but missing in SRD, such as checksum based nondeterminsm detection and single task debugging with conventional debugger (like jdb). However, these features can be easily built upon current SRD framework. To minimize code review effort, I didn't include them into the current version intentionally.

      Attached is the visualization of the MLlib ALS application (with 1 iteration) generated by SRD. For more information, please refer to the SRD overview document.

      Attachments

        1. IMG_20140722_184149.jpg
          51 kB
          Phuoc Do
        2. RDD DAG.png
          10 kB
          Cheng Lian

        Activity

          People

            Unassigned Unassigned
            liancheng liancheng
            Votes:
            6 Vote for this issue
            Watchers:
            16 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: