We did an internal brainstorming to study the feasibility of this. Some of our recent tests on SSDs like Optane shows that they are vastly faster in randomreads and can act as effective caches.
In the current state we have a single tier of Bucket cache and the bucket cache can either be offheap or configured to work with file mode. (The file mode can have multiple files backing it).
So this model restricts us from using either the memory or the file and not both.
With the advent of faster devices like Optane SSDs, NVMe based devices it is better we try to utilize all those devices and try using them for the bucket cache so that we can avoid the impact of slower devices where the actual data resides on the HDFS data nodes.
Combined with this we can allow the user to configure the caching layer per family/table so that one can effectively make use of the caching tiers.
Can upload a design doc here. Before that, would like to know the suggestions here. Thoughts!!!