Description
To fully support Spark RDD's persistence options, we need a few features to provide.
We need to support:
- Default persistence on memory or disk.
- Persistence using memory and disk at the same time (spill).
- Persistence on off-heap memory
- Replication for persisted data
- Disable changing persist strategy after a RDD is executed
- Report the actual state of cached data to optimizer
Attachments
1.
|
Implement disk and memory persistence (Spill) |
|
Open | Unassigned |
2.
|
Support RDD caching |
|
Resolved | Sanha Lee |
3.
|
Implement off-heap memory persistence |
|
Open | Unassigned |
4.
|
Implement replication for persisted data |
|
Open | Unassigned |
5.
|
Disable changing persistence strategy after a RDD is calculated |
|
Open | Unassigned |
6.
|
Report the actual state of cached data to optimizer |
|
Open | Unassigned |