Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-1448

Add CPU time to paragraph output for %pyspark, %spark, %python, and %sql interpreters

    Details

    • Flags:
      Patch, Important

      Description

      In Zeppelin, when using the PySpark interpreter (%pyspark) in a cell or "paragraph," I want the output to list the CPU time in addition to the evaluated output and the actual (physical) time elapsed.

      A specific example of a statement I want to time (not necessarily limited to sql queries) is something like this:

      %pyspark
      ...
      sqlctx = SQLContext(sc)
      ...
      sqlctx.sql("SELECT feature1, feature2, feature3 FROM tableName " +
      "WHERE feature3 = 'a' LIMIT 100").show()

      or a sql count over the total number of rows in the table.

      If I use Zeppelin with the Hive interpreter in a paragraph (%hive), the output automatically includes actual time, CPU time, and the evaluated output.

      Similarly, if I use IPython (either in a shell or in a Jupyter notebook), I can preface a statement with %time to have the output returned along with the CPU time.

      Please add the capability to return CPU time in a (PySpark, etc.) paragraph in a Zeppelin notebook.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jljones Jennifer Jones
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 168h
                168h
                Remaining:
                Remaining Estimate - 168h
                168h
                Logged:
                Time Spent - Not Specified
                Not Specified