Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.14, 1.15
    • Fix Version/s: 1.15
    • Component/s: parser
    • Labels:

      Description

      Tika parsers sometimes use Log4j's Logger, sometimes the JUL (java.util.logging) Logger and sometimes SLF4j.

      It would be better to standardise on a single facade, for the sake of not having to configure multiple loggers.

        Issue Links

          Activity

          Hide
          tallison@mitre.org Tim Allison added a comment -

          Sounds great. Historically, we've avoided logging in the parsers see this thread...this isn't to say we shouldn't revisit this.

          Please also see (and update?): https://wiki.apache.org/tika/Logging

          Show
          tallison@mitre.org Tim Allison added a comment - Sounds great. Historically, we've avoided logging in the parsers see this thread ...this isn't to say we shouldn't revisit this. Please also see (and update?): https://wiki.apache.org/tika/Logging
          Hide
          grossws Konstantin Gribov added a comment -

          Somewhere in the past we decided to avoid switching all logging to slf4j because it would make logging dependency management harder for downstream users.

          And wiki page about logging is almost up-to-date. I'll update deps versions but there's nothing more to do.

          Show
          grossws Konstantin Gribov added a comment - Somewhere in the past we decided to avoid switching all logging to slf4j because it would make logging dependency management harder for downstream users. And wiki page about logging is almost up-to-date. I'll update deps versions but there's nothing more to do.
          Show
          grossws Konstantin Gribov added a comment - See also discussion at https://lists.apache.org/thread.html/9d2ad6d41aa197808a6029173d232c43b222c3936a15a438fa92b9eb@1394136678@%3Cdev.tika.apache.org%3E
          Hide
          tallison@mitre.org Tim Allison added a comment -

          Konstantin Gribov, agreed. However, it looks like slf4j has slipped into the ner/recognition parsers. And, mea culpa, we now have log4j in the new WordPerfect parsers...ugh, and sorry.

          So, I think Matthew Caruana Galizia's proposal is to rid tika-parsers of slf4j and log4j.

          Show
          tallison@mitre.org Tim Allison added a comment - Konstantin Gribov , agreed. However, it looks like slf4j has slipped into the ner/recognition parsers. And, mea culpa, we now have log4j in the new WordPerfect parsers...ugh, and sorry. So, I think Matthew Caruana Galizia 's proposal is to rid tika-parsers of slf4j and log4j.
          Hide
          mcaruanagalizia Matthew Caruana Galizia added a comment -

          So should we agree that parsers should use ONLY JUL and rid it of slf4j and log4j? Or should we standardise on slf4j?

          Show
          mcaruanagalizia Matthew Caruana Galizia added a comment - So should we agree that parsers should use ONLY JUL and rid it of slf4j and log4j? Or should we standardise on slf4j?
          Hide
          grossws Konstantin Gribov added a comment - - edited

          Tim Allison, Matthew Caruana Galizia, we can't get rid of slf4j-api, since many upstream parsers depends on them (rome, netcdf, grib to name some). For exaclty same reason we can't drop commons-logging-api.

          I didn't see log4j in tika-parsers deps and it's better to avoid if no upstream deps use its api directly.

          I personally prefer parsers to use slf4j-api but current status quo is that parser often use same logging framework as its upstream dependency does.

          Show
          grossws Konstantin Gribov added a comment - - edited Tim Allison , Matthew Caruana Galizia , we can't get rid of slf4j-api , since many upstream parsers depends on them (rome, netcdf, grib to name some). For exaclty same reason we can't drop commons-logging-api . I didn't see log4j in tika-parsers deps and it's better to avoid if no upstream deps use its api directly. I personally prefer parsers to use slf4j-api but current status quo is that parser often use same logging framework as its upstream dependency does.
          Hide
          tallison@mitre.org Tim Allison added a comment -

          we can't get rid of slf4j-api

          Y, agreed.

          it's better to avoid if no upstream deps use its api directly....I personally prefer parsers to use slf4j-api

          Konstantin Gribov, sounds like you wouldn't object to unifying the Tika code w/in tika-parsers to use only JUL given the current status quo?

          Show
          tallison@mitre.org Tim Allison added a comment - we can't get rid of slf4j-api Y, agreed. it's better to avoid if no upstream deps use its api directly....I personally prefer parsers to use slf4j-api Konstantin Gribov , sounds like you wouldn't object to unifying the Tika code w/in tika-parsers to use only JUL given the current status quo?
          Hide
          grossws Konstantin Gribov added a comment -

          Tim Allison, I accept using JUL in tika-core since we have no deps there (and will have only commons-io at some later point). And bringing slf4j-api may be worth considering if we will use external deps in core anyway. At least discussable thing, I guess.

          But I'm against using JUL in tika-parsers: it's hard to configure, it cannot be directed to other logging backend without using jul-to-slf4j, it have extra runtime cost like string concatenation and calling toString() for any object which state is logged.

          Anyway, we'll always have deps which use JUL, commons-logging-api and slf4j-api because logging api choose is up to upstream developers in their libs. Why not use common and present dependency to log anything through it?

          When I use Tika in my downstream projects I configure logging as I described at wiki page: logback-classic as impl (usually runtime dependency) and jul-to-slf4j, jcl-over-slf4j, log4j-over-slf4j to forward all other logging frameworks apis via slf4j-api to logback. It works perfectly, has one point to configure etc.

          Show
          grossws Konstantin Gribov added a comment - Tim Allison , I accept using JUL in tika-core since we have no deps there (and will have only commons-io at some later point). And bringing slf4j-api may be worth considering if we will use external deps in core anyway. At least discussable thing, I guess. But I'm against using JUL in tika-parsers : it's hard to configure, it cannot be directed to other logging backend without using jul-to-slf4j , it have extra runtime cost like string concatenation and calling toString() for any object which state is logged. Anyway, we'll always have deps which use JUL, commons-logging-api and slf4j-api because logging api choose is up to upstream developers in their libs. Why not use common and present dependency to log anything through it? When I use Tika in my downstream projects I configure logging as I described at wiki page: logback-classic as impl (usually runtime dependency) and jul-to-slf4j , jcl-over-slf4j , log4j-over-slf4j to forward all other logging frameworks apis via slf4j-api to logback. It works perfectly, has one point to configure etc.
          Hide
          tallison@mitre.org Tim Allison added a comment -

          Why not use common and present dependency to log anything through it?

          So the proposal is to use JUL in core (as we're doing now), but to use commons-logging within our tika-parser code?

          So to "standardize logging", we'll get rid of log4j, slf4j and JUL loggers in our tika-parsers?

          Thank you!

          Show
          tallison@mitre.org Tim Allison added a comment - Why not use common and present dependency to log anything through it? So the proposal is to use JUL in core (as we're doing now), but to use commons-logging within our tika-parser code? So to "standardize logging", we'll get rid of log4j, slf4j and JUL loggers in our tika-parsers? Thank you!
          Hide
          grossws Konstantin Gribov added a comment -

          Tim Allison, I guess Matthew Caruana Galizia wants just to simplify logger configuration as downstream user. As a facade slf4j is currently best option, and it's designed for such usecases. I don't know how simple is to configure slf4j in OSGi environment, maybe Bob Paulin can add something about that.

          So, I propose to do all logging via slf4j-api, add jul-to-slf4j, jcl-over-slf4j and commons-logging-api to tika-parsers dependencies, exclude commons-logging from it and recommend users to add slf4j-log4j12 as default logging dependency (we should avoid adding it ourselves since it's not library responsibility). It allows user to opt-out and use logback-classic + log4j-over-slf4j if he wants. Alternative way should also be documented.

          If it's OK, I'll push it to master soon.

          Show
          grossws Konstantin Gribov added a comment - Tim Allison , I guess Matthew Caruana Galizia wants just to simplify logger configuration as downstream user. As a facade slf4j is currently best option, and it's designed for such usecases. I don't know how simple is to configure slf4j in OSGi environment, maybe Bob Paulin can add something about that. So, I propose to do all logging via slf4j-api , add jul-to-slf4j , jcl-over-slf4j and commons-logging-api to tika-parsers dependencies, exclude commons-logging from it and recommend users to add slf4j-log4j12 as default logging dependency (we should avoid adding it ourselves since it's not library responsibility). It allows user to opt-out and use logback-classic + log4j-over-slf4j if he wants. Alternative way should also be documented. If it's OK, I'll push it to master soon.
          Hide
          tallison@mitre.org Tim Allison added a comment -

          Great. Thank you!

          Show
          tallison@mitre.org Tim Allison added a comment - Great. Thank you!
          Hide
          bobpaulin Bob Paulin added a comment - - edited

          Konstantin Gribov In my experience OSGi is pretty unopinionated on logging. For exmaple Pax-logging is what I often use https://ops4j1.jira.com/wiki/display/ops4j/Using+pax+logging . SLF4J is supported

          Show
          bobpaulin Bob Paulin added a comment - - edited Konstantin Gribov In my experience OSGi is pretty unopinionated on logging. For exmaple Pax-logging is what I often use https://ops4j1.jira.com/wiki/display/ops4j/Using+pax+logging . SLF4J is supported
          Hide
          grossws Konstantin Gribov added a comment -

          Thanks, Bob Paulin.

          I'm currently testing after refactoring and will publish my patch for review before pushing it to master.

          Show
          grossws Konstantin Gribov added a comment - Thanks, Bob Paulin . I'm currently testing after refactoring and will publish my patch for review before pushing it to master.
          Hide
          grossws Konstantin Gribov added a comment -

          Patchset is in logging-refactored branch, see also thread at dev@

          Show
          grossws Konstantin Gribov added a comment - Patchset is in logging-refactored branch , see also thread at dev@
          Hide
          tallison@mitre.org Tim Allison added a comment -

          +1, I'm sorry you had to redo so much of my code. Thank you!

          Show
          tallison@mitre.org Tim Allison added a comment - +1, I'm sorry you had to redo so much of my code. Thank you!
          Hide
          grossws Konstantin Gribov added a comment -

          Tim Allison, don't mention it, it's just a negligible downside of common repo in front of benefits of opes source project ,)

          I've also found that tika-eval use log4j's Level and its xml logging format, so it's coupled more tightly to log4j than I thought.
          I think, it should be stay as is, since it's a part of standalone tool you use for regression tests.

          Show
          grossws Konstantin Gribov added a comment - Tim Allison , don't mention it, it's just a negligible downside of common repo in front of benefits of opes source project ,) I've also found that tika-eval use log4j's Level and its xml logging format, so it's coupled more tightly to log4j than I thought. I think, it should be stay as is, since it's a part of standalone tool you use for regression tests.
          Hide
          grossws Konstantin Gribov added a comment -

          Merged into master

          Show
          grossws Konstantin Gribov added a comment - Merged into master
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Tika-trunk #1232 (See https://builds.apache.org/job/Tika-trunk/1232/)
          TIKA-2245 Logging unification (grossws: https://github.com/apache/tika/commit/e8ee4ce7fbdc4fd8d0a496731f2b0a481687b72f)

          • (edit) tika-batch/src/main/java/org/apache/tika/batch/fs/BasicTikaFSConsumer.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/recognition/tf/TensorflowImageRecParser.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/xml/ElementMetadataHandler.java
          • (edit) tika-batch/src/main/java/org/apache/tika/batch/FileResourceCrawler.java
          • (edit) tika-eval/src/main/java/org/apache/tika/eval/reports/Report.java
          • (edit) tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ListManager.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/ner/mitie/MITIENERecogniser.java
          • (edit) tika-batch/src/main/java/org/apache/tika/batch/Interrupter.java
          • (edit) tika-batch/src/main/java/org/apache/tika/batch/StatusReporter.java
          • (edit) tika-server/src/main/java/org/apache/tika/server/resource/TikaResource.java
          • (edit) tika-server/src/main/java/org/apache/tika/server/resource/LanguageResource.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WP6Charsets.java
          • (edit) tika-eval/src/main/java/org/apache/tika/eval/XMLErrorLogUpdater.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/html/HtmlParser.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/AbstractPOIFSExtractor.java
          • (edit) tika-server/src/main/java/org/apache/tika/server/resource/DetectorResource.java
          • (edit) tika-server/src/main/java/org/apache/tika/server/resource/UnpackerResource.java
          • (edit) tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/geo/topic/GeoParserConfig.java
          • (edit) tika-batch/src/main/java/org/apache/tika/batch/fs/AbstractFSConsumer.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
          • (edit) tika-batch/pom.xml
          • (edit) tika-eval/src/main/java/org/apache/tika/eval/reports/ResultsReporter.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/pot/PooledTimeSeriesParser.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/SummaryExtractor.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WP5Charsets.java
          • (edit) tika-server/src/main/java/org/apache/tika/server/resource/RecursiveMetadataResource.java
          • (edit) tika-example/pom.xml
          • (edit) tika-server/src/main/java/org/apache/tika/server/resource/TranslateResource.java
          • (edit) tika-example/src/main/java/org/apache/tika/example/LazyTextExtractorField.java
          • (edit) tika-batch/src/main/java/org/apache/tika/batch/fs/RecursiveParserWrapperFSConsumer.java
          • (edit) tika-eval/src/main/java/org/apache/tika/eval/io/DBWriter.java
          • (edit) tika-eval/src/main/java/org/apache/tika/eval/io/XMLLogReader.java
          • (edit) tika-parsers/src/test/java/org/apache/tika/parser/ner/nltk/NLTKNERecogniserTest.java
          • (edit) tika-batch/src/main/java/org/apache/tika/batch/BatchProcess.java
          • (edit) tika-eval/src/main/java/org/apache/tika/eval/io/ExtractReader.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/QPWTextExtractor.java
          • (edit) tika-app/pom.xml
          • (edit) tika-eval/src/main/java/org/apache/tika/eval/tokens/CommonTokenCountManager.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/recognition/tf/TensorflowRESTRecogniser.java
          • (edit) tika-batch/src/main/java/org/apache/tika/batch/fs/FSDirectoryCrawler.java
          • (edit) tika-parent/pom.xml
          • (edit) tika-eval/src/main/java/org/apache/tika/eval/db/JDBCUtil.java
          • (edit) tika-eval/src/main/java/org/apache/tika/eval/AbstractProfiler.java
          • (edit) tika-parsers/pom.xml
          • (edit) tika-server/pom.xml
          • (edit) tika-batch/src/main/java/org/apache/tika/batch/fs/FSListCrawler.java
          • (add) tika-server/src/main/resources/log4j.properties
          • (add) tika-bundle/src/test/resources/log4j.properties
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/ner/NamedEntityParser.java
          • (edit) tika-batch/src/main/java/org/apache/tika/batch/BatchProcessDriverCLI.java
          • (delete) tika-server/src/main/resources/commons-logging.properties
          • (edit) tika-server/src/main/java/org/apache/tika/server/resource/MetadataResource.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/geo/topic/GeoParser.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/geo/topic/gazetteer/GeoGazetteerClient.java
          • (edit) tika-bundle/pom.xml
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/image/ImageParser.java
          • (edit) tika-server/src/main/java/org/apache/tika/server/TikaLoggingFilter.java
          • (edit) tika-batch/src/main/java/org/apache/tika/batch/fs/strawman/StrawManTikaAppDriver.java
          • (edit) tika-batch/src/main/java/org/apache/tika/batch/fs/FSBatchProcessCLI.java
          • (edit) tika-batch/src/main/java/org/apache/tika/batch/FileResourceConsumer.java
          • (edit) tika-example/src/main/java/org/apache/tika/example/ImportContextImpl.java
            TIKA-2245 Updated CHANGES.txt with info about logging (grossws: https://github.com/apache/tika/commit/7ce58d6592733e314bdf26065ed4d0e123f1fe60)
          • (edit) CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Tika-trunk #1232 (See https://builds.apache.org/job/Tika-trunk/1232/ ) TIKA-2245 Logging unification (grossws: https://github.com/apache/tika/commit/e8ee4ce7fbdc4fd8d0a496731f2b0a481687b72f ) (edit) tika-batch/src/main/java/org/apache/tika/batch/fs/BasicTikaFSConsumer.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/recognition/tf/TensorflowImageRecParser.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/xml/ElementMetadataHandler.java (edit) tika-batch/src/main/java/org/apache/tika/batch/FileResourceCrawler.java (edit) tika-eval/src/main/java/org/apache/tika/eval/reports/Report.java (edit) tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ListManager.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/ner/mitie/MITIENERecogniser.java (edit) tika-batch/src/main/java/org/apache/tika/batch/Interrupter.java (edit) tika-batch/src/main/java/org/apache/tika/batch/StatusReporter.java (edit) tika-server/src/main/java/org/apache/tika/server/resource/TikaResource.java (edit) tika-server/src/main/java/org/apache/tika/server/resource/LanguageResource.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WP6Charsets.java (edit) tika-eval/src/main/java/org/apache/tika/eval/XMLErrorLogUpdater.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/html/HtmlParser.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/AbstractPOIFSExtractor.java (edit) tika-server/src/main/java/org/apache/tika/server/resource/DetectorResource.java (edit) tika-server/src/main/java/org/apache/tika/server/resource/UnpackerResource.java (edit) tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/geo/topic/GeoParserConfig.java (edit) tika-batch/src/main/java/org/apache/tika/batch/fs/AbstractFSConsumer.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java (edit) tika-batch/pom.xml (edit) tika-eval/src/main/java/org/apache/tika/eval/reports/ResultsReporter.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/pot/PooledTimeSeriesParser.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/SummaryExtractor.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/WP5Charsets.java (edit) tika-server/src/main/java/org/apache/tika/server/resource/RecursiveMetadataResource.java (edit) tika-example/pom.xml (edit) tika-server/src/main/java/org/apache/tika/server/resource/TranslateResource.java (edit) tika-example/src/main/java/org/apache/tika/example/LazyTextExtractorField.java (edit) tika-batch/src/main/java/org/apache/tika/batch/fs/RecursiveParserWrapperFSConsumer.java (edit) tika-eval/src/main/java/org/apache/tika/eval/io/DBWriter.java (edit) tika-eval/src/main/java/org/apache/tika/eval/io/XMLLogReader.java (edit) tika-parsers/src/test/java/org/apache/tika/parser/ner/nltk/NLTKNERecogniserTest.java (edit) tika-batch/src/main/java/org/apache/tika/batch/BatchProcess.java (edit) tika-eval/src/main/java/org/apache/tika/eval/io/ExtractReader.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/wordperfect/QPWTextExtractor.java (edit) tika-app/pom.xml (edit) tika-eval/src/main/java/org/apache/tika/eval/tokens/CommonTokenCountManager.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/recognition/tf/TensorflowRESTRecogniser.java (edit) tika-batch/src/main/java/org/apache/tika/batch/fs/FSDirectoryCrawler.java (edit) tika-parent/pom.xml (edit) tika-eval/src/main/java/org/apache/tika/eval/db/JDBCUtil.java (edit) tika-eval/src/main/java/org/apache/tika/eval/AbstractProfiler.java (edit) tika-parsers/pom.xml (edit) tika-server/pom.xml (edit) tika-batch/src/main/java/org/apache/tika/batch/fs/FSListCrawler.java (add) tika-server/src/main/resources/log4j.properties (add) tika-bundle/src/test/resources/log4j.properties (edit) tika-parsers/src/main/java/org/apache/tika/parser/ner/NamedEntityParser.java (edit) tika-batch/src/main/java/org/apache/tika/batch/BatchProcessDriverCLI.java (delete) tika-server/src/main/resources/commons-logging.properties (edit) tika-server/src/main/java/org/apache/tika/server/resource/MetadataResource.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/geo/topic/GeoParser.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/geo/topic/gazetteer/GeoGazetteerClient.java (edit) tika-bundle/pom.xml (edit) tika-parsers/src/main/java/org/apache/tika/parser/image/ImageParser.java (edit) tika-server/src/main/java/org/apache/tika/server/TikaLoggingFilter.java (edit) tika-batch/src/main/java/org/apache/tika/batch/fs/strawman/StrawManTikaAppDriver.java (edit) tika-batch/src/main/java/org/apache/tika/batch/fs/FSBatchProcessCLI.java (edit) tika-batch/src/main/java/org/apache/tika/batch/FileResourceConsumer.java (edit) tika-example/src/main/java/org/apache/tika/example/ImportContextImpl.java TIKA-2245 Updated CHANGES.txt with info about logging (grossws: https://github.com/apache/tika/commit/7ce58d6592733e314bdf26065ed4d0e123f1fe60 ) (edit) CHANGES.txt

            People

            • Assignee:
              grossws Konstantin Gribov
              Reporter:
              mcaruanagalizia Matthew Caruana Galizia
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development