Context managers are a natural way to capture closely related setup and teardown code in Python.
For example, they are commonly used when doing file I/O:
Once the program exits the with block, f is automatically closed.
I think it makes sense to apply this pattern to persisting and unpersisting DataFrames and RDDs. There are many cases when you want to persist a DataFrame for a specific set of operations and then unpersist it immediately afterwards.
For example, take model training. Today, you might do something like this:
If persist() returned a context manager, you could rewrite this as follows:
Upon exiting the with block, labeled_data would automatically be unpersisted.
This can be done in a backwards-compatible way since persist() would still return the parent DataFrame or RDD as it does today, but add two methods to the object: __enter__() and __exit__()