Description
I know this has been an issue before PDFBOX-693, but since then a lot of time has passed. I propose to switch Logging to either SLF4J (my preference) or Log4J.
Why? Currently, PDFBox uses commons logging. That library seems not to be actively maintained anymore, the last release was in 2014 with the comment „the main purpose of the 1.2 release is to drop support for Java 1.1“. PDFBox 4 requires at least Java 11. The Java language has evolved a lot over the last few years, and so have to logging frameworks.
Maintainabilty and Performance
The main point that speaks against commons logging IMHO is that you have to make compromises between readability and performance. Let’s look at an example (taken from COSParser.java):
if (offsetOrObjstmObNr != null) { LOG.debug("Set missing offset " + offsetOrObjstmObNr + " for object " + objKey); document.getXrefTable().put(objKey, offsetOrObjstmObNr); }
Each time the if-statement's body is entered, the logging message is constructed. This involves:
- converting offsetOrObjstmObNr to a String
- converting objKey to a String which in turn does two more conversions of integer and long values to a String, creates a new String by joining results of former conversions
- creating a new String consisting of the results of the both operations above and some static text
A lot of temporary objects are created and all of this happens regardless of whether logging is necessary or not.
Both SLF4J and Log4J (and even JUL logging) support a message format syntax where arguments to logging calls are only evaluated when needed:
if (offsetOrObjstmObNr != null) { LOG.debug("Set missing offset {} for object {}", offsetOrObjstmObNr, objKey); document.getXrefTable().put(objKey, offsetOrObjstmObNr); }
Using common s logging, you can guard the logging statement:
if (offsetOrObjstmObNr != null) { if (LOG.isDebugEnabled()) { LOG.debug("Set missing offset " + offsetOrObjstmObNr + " for object " + objKey); } document.getXrefTable().put(objKey, offsetOrObjstmObNr); }
This works, but makes the code less readable and harder to maintain because the level has to be changed in two places instead of one.
Flexibility
Another thing is that both SLF4J and Log4J offer you a facade that you can bridge to nearly every logging implementation in widespread use nowadays whereas commons logging is a full logging implementation that most of us do not need (there are ways to make commons logging work with other backends, but they are more of a workaround than a clean solution).
JPMS support
And of course, both SLF4J and Log4J offer first class support for the module system.
So what are the PDFBox maintainer's thoughts on this? I think PDFBox 4 offers an opportunity to switch to a new Logging facade/framework, and if there's a concensus, I volunteer to contribute a patch to make the transition.
Support Lambda Logging
In some cases, even using a message format ist not enough, for example there are instances in the code where a `byte[]` value is logged as a String. Since the String conversion has to be done explicitly, it would be done before the actual logging method is entered. This can be solved using lambdas by passing `() -> new String(value)` instead (syntax may vary).
Alternatives
The alternatives (sorted according to my own preference) are:
- use SLF4J as logging facade
- use Log4J2 as logging facade
- use JUL logging (comes with a cost when used with other logging backends)
- stay with commons logging and add guards in the code where they are not yet present
- stay with commons logging and try to convince the maintainers (if there still are any) to add a message format support that is comparable to what the other three and then switch to the new version of commons logging once the functionality is there (and yes, if there's a chance this gets approved by the maintainers I'd see if I can contribute the necessary changes to commons logging provided the idea gets support over there)
Cheers,
Axel