Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.7.0
-
None
-
Mac OS X El Capitan Version 10.11.6, Spark 2.0.0, Zeppelin 0.6.1, and the Anaconda distribution of Python 3.5.2.
-
Patch, Important
Description
In Zeppelin, when using the PySpark interpreter (%pyspark) in a cell or "paragraph," I want the output to list the CPU time in addition to the evaluated output and the actual (physical) time elapsed.
A specific example of a statement I want to time (not necessarily limited to sql queries) is something like this:
%pyspark
...
sqlctx = SQLContext(sc)
...
sqlctx.sql("SELECT feature1, feature2, feature3 FROM tableName " +
"WHERE feature3 = 'a' LIMIT 100").show()
or a sql count over the total number of rows in the table.
If I use Zeppelin with the Hive interpreter in a paragraph (%hive), the output automatically includes actual time, CPU time, and the evaluated output.
Similarly, if I use IPython (either in a shell or in a Jupyter notebook), I can preface a statement with %time to have the output returned along with the CPU time.
Please add the capability to return CPU time in a (PySpark, etc.) paragraph in a Zeppelin notebook.