Description
While working on a few other the following flaws have been detected with the CachedFileEntityResolver which need to be addressed:
- If a resource is not available, but through HTTP and the target will redirect to HTTPS HttpURLConnection will not follow by default and even worse it will not notify us
- Our required resources: fml.xsd, xdoc.xsd and xml.xsd are only checked by system id and not public id which means you need to take care of multiple URLs
- It perfoms outbound connections for resources which could be available offline, e.g., schemas from above
- It logs zero information what is happening making debugging very hard
- If a document does not supply a schema or DTD the validation fails while logically there is nothing to validate. E.g., HTML5 is now schemaless with mere <!DOCTYPE html>.
Things to be done:
- Have all supported schema in the classpath for fast access
- Remove all not used schemas
- Provide a public id to classpath resource mapping to avoid alternating system ids
- Add debug logging to assist analysis
- Don't fail if a schema is not provided
- If URL is a file load directly because file IO is fast.
Likely other points.
Attachments
Issue Links
- links to