[SPARK-21510] Add isMaterialized() and eager persist() to Dataset APIs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.3.0
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed

Description

Currently, when using Spark, the beginners do not realize our persist API is lazy. They do not know what is the most efficient way to materialize it. Sometimes, they just use collect(), which is very expensive when the data set is big.

In addition, we also need another API to verify whether the Dataset has been cached and materialized.

Attachments

Issue Links

links to

[Github] Pull Request #18717 (gatorsmile)

GitHub Pull Request #18717

Activity

People

Assignee:: Xiao Li

Reporter:: Xiao Li

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 23/Jul/17 04:21

Updated:: 25/May/21 01:53

Resolved:: 25/May/21 01:44