Thanks Ben Podgursky for sharing your thoughts. We are glad to hear voices from hadoop users as well as developers from hadoop and other related projects.
The number of watchers on this ticket should be an indicator in itself, and I'm sure for every person who bothered creating a jira account, there are a couple hundred who were bitten by this.
The number of watchers doesn't indicate they are supporting this patch but just they pay attention to. They have to because this patch break everything. We can bring this topic to hadoop-user alias to get more audience if needed.
Has anyone contacted the maintainers on the major affected downstream projects like Spark and HBase? My guess is that they would be more than happy to help work around any breakages this upgrade causes – if they are like us, they would be overjoyed at being able to finally upgrade.
I doubt this. I talked with several guys from HBase and Tez community offline - none of them are happy with doing the same change. That means our change are pushing them to create/maintain branches for different version of hadoop (if no shading work in
HADOOP-14284 which has side effect though). It also means the releases need to get synchronized, otherwise no downstream project release can work with Hadoop GA release will be a bigger problem.
Hadoop's ancient Guava dependency is the single largest issue we run into when putting any other third party jar on our client or task classpath.
Please don't blame hadoop for this. The real problem here is the poor incompatibility of Guava across different versions and they released 21 major versions across 7 years. The ancient guava dependency are also suffering from this.
There are many libraries developed outside the Hadoop ecosystem which (rightfully) assume they can use a newer version of Guava, and we regularly either have to do horrible hacks to get these onto the client classpath, or tell developers not to use them. This is an incredible time-waster that shouldn't exist.
Is there any protocol for these libraries outside of hadoop to choose an uniform version of guava (like version 21)? If not, bump up guava version here does't help as it will still break these libraries which are using different version. Also, there are also many apps developed within hadoop ecosystem which should be seen as first class citizenship that get totally break by this change. As a system software, we should have some better solution like: application classpath isolation, etc. instead of simply keeping dependency updated.
I understand the concerns about stability, but this is a major upgrade, no? I don't think it's acceptable to say that Hadoop will be running Guava 11 until Hadoop 4 comes out in 2025.
Guava is not Java, I don't think any hadoop releases should bind with specific guava version. It is just a third party library - with poor incompatibility - that's it!