Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.3.0
-
None
Description
Currently, when using Spark, the beginners do not realize our persist API is lazy. They do not know what is the most efficient way to materialize it. Sometimes, they just use collect(), which is very expensive when the data set is big.
In addition, we also need another API to verify whether the Dataset has been cached and materialized.