In the Impala code we have checks in various places about which filesystem implementation we are using. E.g in the frontend, many of these checks are here - https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java.
In the frontend, some of these checks are done using the instanceof operator with subclasses of org.apache.hadoop.fs.FileSystem. E.g.
We also identify filesystem based on the scheme, e.g. s3a in a URL like s3a://path/
The proposal is to replace all instanceof use with checks based on the scheme, which we can get from the FileSystem - https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html#getScheme--
Checking the java class and the scheme are not exactly equivalent because there are some cases where a new scheme is handled by a known class (or subclass of that class) - that's what happened with Alluxio with
IMPALA-10087 where we accidentally supported it for a bit until we broke it. But since IMPALA-6050 we need to check both the scheme and the class, so it would be better at this point to just standardise on the scheme AFAICT.
In future we could conceivably then remove some of this hardcoded logic and consolidate the information about filesystem capabilities into one place.