Description
Calling cache or persist after a call to orderBy or sortBy on a DataFrame runs not lazy and creates a Spark job:
spark.range(1, 1000).orderBy("id").cache()
Other operations do not generate a job when cached:
spark.range(1, 1000).repartition(2).cache() spark.range(1, 1000).groupBy("id").agg(fn.min("id")).cache() spark.range(1, 1000).cache()