We don't use transitive dependencies at the moment, because we want to be sure what libs are added and for the binary distribution we need to add license notes (which cannot be generated by Ivy) for every single JAR. So we would simply remove the dependency to ucar.
Gotcha, OK, cool.
The parser is still listed in META-INF, so when a Java 5 users tries to parse a NetCDF file, he gets a ClassNotFound by the NetCDF parser.
Couldn't you take the Parser out of the file:
(e.g., the Service loading mechanism). If you remove the org.apache.tika.parser.netcdf.NetCDFParser and org.apache.tika.parser.hdf.HDFParser entries from that file, the user will never reach the NetCDF or HDF Parser, right? I think you guys can provide your own custom copy of this file, and make sure it's at the root of the classpath in Solr Cell and then it will take your guys version over the baked in one for the tika-parsers jar.
it would be good to pass a META-INF like list to the AutoDetectParser (I implemented that for another non-solr project we use at PANGAEA, where i used the META-INF list of Tika, deleted all unused parsers and passed them somehow to TIKA)
This sounds cool. How is it different from the service provide mechanism though. I think it's serving a similar purpose, right?
A good idea for TIKA would be to have several tika-parsers packages, maybe one with "office document parsers", "images",... Are there any plans to split the parser package?
This was discussed a while back, check out for the thoughts there: https://issues.apache.org/jira/browse/TIKA-686
I tried this a few weeks ago and with JDK 1.5, tests were failing.
Our latest Jenkins build (which I think is locked to 1.5) passes (look at the one before I started mucking with tika-server):