Description
Originally reported by the user:
https://github.com/apache/hudi/issues/6137
Crux of the issue is that Databricks's DBR runtime diverges from OSS Spark, and in that case `FileStatusCache` API is very clearly divergent b/w the two.
There are a few approaches we can take:
- Avoid reliance on Spark's FIleStatusCache implementation altogether and rely on our own one
- Apply more staggered approach where we first try to use Spark's FileStatusCache and if it doesn't match expected API, we fallback to our own impl
Approach # 1 would actually mean that we're not sharing cache implementation w/ Spark, which in turn would entail that in some cases we might be keeping 2 instances of the same cache. Approach # 2 remediates that and allows us to only fallback in case API is not compatible.