For use cases where we need to delete very large amounts of data from Phoenix tables running a synchronous delete can be problematic. In order to guarantee that the delete completes, handle failure scenarios, and ensure it doesn't put too much load on the HBase cluster and crowd out other queries running we need to build tooling around the longer running delete operations to chunk them up, provide retries in the event of failures, and have ways to throttle delete load if the Region Servers get hot.
It would be really great if Phoenix offered a way to invoke a resilient delete that was processed asynchronously and had minimal load on the cluster.
An idea mentioned to implement this is to introduce a DEFERRED keyword to the DELETE operation and for such a delete to remove the data at compaction time.
For our use cases, ideally, we would like to set delete filters that are based on the first 2 elements of the row key (a multi-tenant id and the next item).