Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24215

Implement eager evaluation for DataFrame APIs

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.4.0
    • Component/s: PySpark, Spark Core, SQL
    • Labels:
      None
    • Target Version/s:

      Description

      To help people that are new to Spark get feedback more easily, we should implement the repr methods for Jupyter python kernels. That way, when users run pyspark in jupyter console or notebooks, they get good feedback about the queries they've defined.

      This should include an option for eager evaluation, (maybe spark.jupyter.eager-eval?). When set, the formatting methods would run dataframes and produce output like show. This is a good balance between not hiding Spark's action behavior and getting feedback to users that don't know to call actions.

      Here's the dev list thread for context: http://apache-spark-developers-list.1001551.n3.nabble.com/eager-execution-and-debuggability-td23928.html

        Attachments

          Activity

            People

            • Assignee:
              XuanYuan Yuanjian Li
              Reporter:
              rdblue Ryan Blue
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: