Description
Now spark will only cleanup shuffle data files except push-based shuffle files.
In our production cluster, push-based shuffle service will create too many shuffle merge data files as there are several spark thrift server.
Could we cleanup the merged data files after the query finished?
Attachments
Issue Links
- duplicates
-
SPARK-38005 Support cleaning up merged shuffle files and state from external shuffle service
- Resolved
- links to