Lucene - Core
  1. Lucene - Core
  2. LUCENE-3598

Improve InfoStream class in trunk to be more consistent with logging-frameworks like slf4j/log4j/commons-logging

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Followup on a thread by Shai Erea on java-dev@lao: I already discussed with Robert about that, that there is one thing missing. Currently the IW only checks if the infoStream!=null and then passes the message to the method, and that may ignore it. For your requirement it is the case that this is enabled or disabled dynamically. Unfortunately if the construction of the message is heavy, then this wastes resources.

      I would like to add another method to this class: abstract boolean isEnabled() that can also be implemented. I would then replace all null checks in IW by this method. The default config in IW would be changed to use a NoOutputInfoStream that returns false here and ignores the message.

      A simple logger wrapper for e.g. log4j / slf4j then could look like (ignoring component, could be enabled):

      Loger log = YourLoggingFramework.getLogger(IndexWriter.class);
      
      public void message(String component, String message) {
        log.debug(component + ": " + message);
      }
      
      public boolean isEnabled(String component) {
        return log.isDebugEnabled();
      }
      

      Using this you could enable/disable logging live by e.g. the log4j management console of your app server by enabling/disabling IndexWriter.class logging.

      The changes are really simple:

      • PrintStreamInfoStream returns true, always, mabye make it dynamically enable/disable to allow Shai's request
      • infoStream.getDefault() is never null and can never be set to null. Instead the default is a singleton NoOutputInfoStream that returns false of isEnabled(component).
      • All null checks on infoStream should be replaced by infoStream.isEanbled(component), this is possible as always != null. There are no slowdowns by this - it's like Collections.emptyList() instead stupid null checks.
      1. LUCENE-3598.patch
        7 kB
        Uwe Schindler
      2. LUCENE-3598.patch
        8 kB
        Uwe Schindler
      3. LUCENE-3598.patch
        61 kB
        Uwe Schindler
      4. LUCENE-3598.patch
        63 kB
        Uwe Schindler
      5. LUCENE-3598.patch
        37 kB
        Michael McCandless

        Activity

        Hide
        Uwe Schindler added a comment - - edited

        Here a first patch to show the API.

        The current patch misses to replace all "if (infoStream != null)" by "if (infoStream.isEnabled())", so it of course slows down indexing because all messages are passed to the no-longer-null InfoStream

        I also have to check for other components using InfoStream except IndexWriter to add null checks with IAE.

        Show
        Uwe Schindler added a comment - - edited Here a first patch to show the API. The current patch misses to replace all "if (infoStream != null)" by "if (infoStream.isEnabled())", so it of course slows down indexing because all messages are passed to the no-longer-null InfoStream I also have to check for other components using InfoStream except IndexWriter to add null checks with IAE.
        Hide
        Uwe Schindler added a comment -

        Update method name in description.

        Show
        Uwe Schindler added a comment - Update method name in description.
        Hide
        Robert Muir added a comment - - edited

        I agree with the plan here

        (once we are sure all != null checks have been eliminated!)

        Just as an aside we may want to cut over other classes like CheckIndex and SegmentInfos to use this class too eventually, just for consistency. I think currently these still take PrintStreams.

        Show
        Robert Muir added a comment - - edited I agree with the plan here (once we are sure all != null checks have been eliminated!) Just as an aside we may want to cut over other classes like CheckIndex and SegmentInfos to use this class too eventually, just for consistency. I think currently these still take PrintStreams.
        Hide
        Uwe Schindler added a comment -

        This just moves the implementation-specific messageID to the PrintStream impl

        Show
        Uwe Schindler added a comment - This just moves the implementation-specific messageID to the PrintStream impl
        Hide
        Shai Erera added a comment -

        Why do I have a feeling like we're reinventing a logging framework? In the past, I've suggested that we use either JDK logging (simplest) and some have suggested SLF4J. Why don't we reconsider this again, removing InfoStream entirely? This will allow better logging control of Lucene modules and remove a whole lot of API to pass the InfoStream down to every class that we'll want to log debug messages.

        We can still include a Logger only for the classes that accept InfoStream today, but we won't need the setInfoStream() API + one will be able to turn on logging for just, say, DocumentsWriter.

        Show
        Shai Erera added a comment - Why do I have a feeling like we're reinventing a logging framework? In the past, I've suggested that we use either JDK logging (simplest) and some have suggested SLF4J. Why don't we reconsider this again, removing InfoStream entirely? This will allow better logging control of Lucene modules and remove a whole lot of API to pass the InfoStream down to every class that we'll want to log debug messages. We can still include a Logger only for the classes that accept InfoStream today, but we won't need the setInfoStream() API + one will be able to turn on logging for just, say, DocumentsWriter.
        Hide
        Robert Muir added a comment -

        We don't need logging for infostream.

        Most of the time someone just wants a simple configuration to dump to a file or they are
        a developer that wants to dump to stdout for a test.

        Please, lets not make this difficult for the 0.000001% of users with some complex use case.

        Show
        Robert Muir added a comment - We don't need logging for infostream. Most of the time someone just wants a simple configuration to dump to a file or they are a developer that wants to dump to stdout for a test. Please, lets not make this difficult for the 0.000001% of users with some complex use case.
        Hide
        Uwe Schindler added a comment -

        Ideally this InfoStream should be an simple 2 method interface, not even an abstract class.

        Show
        Uwe Schindler added a comment - Ideally this InfoStream should be an simple 2 method interface, not even an abstract class.
        Hide
        Uwe Schindler added a comment -

        Patch that replaces all infoStream != null occurences and renmoves checks where not needed (e.g. only message with static string).
        Somme comments:

        • removed IndexWriter.verbose()
        • MergePolicy code should be cleaned up and all stupid verbos() calls removed. Instead the logging should be implemented one level higher (MergePolicy abstract class)
        • DocState still has a strange PrintStream
        • same with different other classes, they should move to InfoStream
        • IndexFileDeleter requires InfoStream, for now I made a noncommit in DirectoryReader, which instantiates with InfoStream.NO_OUTPUT - this should maybe use the default?
        Show
        Uwe Schindler added a comment - Patch that replaces all infoStream != null occurences and renmoves checks where not needed (e.g. only message with static string). Somme comments: removed IndexWriter.verbose() MergePolicy code should be cleaned up and all stupid verbos() calls removed. Instead the logging should be implemented one level higher (MergePolicy abstract class) DocState still has a strange PrintStream same with different other classes, they should move to InfoStream IndexFileDeleter requires InfoStream, for now I made a noncommit in DirectoryReader, which instantiates with InfoStream.NO_OUTPUT - this should maybe use the default?
        Hide
        Uwe Schindler added a comment -

        Fix also SolrIndexWriter: remove TimeLoggingPrintStream: the default PrintStreamInfoStream of Lucene logs with data/time.
        Maybe add in the future the option to add all infoStream messages to the standard Solr SLF4J log as DEBUG/TRACE (only if really enabled - log.isDebugEnabled(SolrIndexWriter.class), so no synchronized logging shit occurs).

        Show
        Uwe Schindler added a comment - Fix also SolrIndexWriter: remove TimeLoggingPrintStream: the default PrintStreamInfoStream of Lucene logs with data/time. Maybe add in the future the option to add all infoStream messages to the standard Solr SLF4J log as DEBUG/TRACE (only if really enabled - log.isDebugEnabled(SolrIndexWriter.class), so no synchronized logging shit occurs).
        Hide
        Shai Erera added a comment -

        Please, lets not make this difficult for the 0.000001% of users with some complex use case.

        I think that the InfoStream solution is complex, and not vice versa. But that's just my opinion. And besides, not all of our code is consistent on the use of InfoStream. For instance, the facet module uses JDK logging (it came with it) in various classes. Do you suggest that we remove it and add setInfoStream API to all the classes that want to output logging info? I wouldn't want to do that ...

        Most of the time someone just wants a simple configuration to dump to a file ...

        Yes, that's the whole idea behind logging. It's just that today we force an "all or nothing" behavior for the code. Perhaps with the new InfoStream class (with its 'component' argument) this will be improved.

        Show
        Shai Erera added a comment - Please, lets not make this difficult for the 0.000001% of users with some complex use case. I think that the InfoStream solution is complex, and not vice versa. But that's just my opinion. And besides, not all of our code is consistent on the use of InfoStream. For instance, the facet module uses JDK logging (it came with it) in various classes. Do you suggest that we remove it and add setInfoStream API to all the classes that want to output logging info? I wouldn't want to do that ... Most of the time someone just wants a simple configuration to dump to a file ... Yes, that's the whole idea behind logging. It's just that today we force an "all or nothing" behavior for the code. Perhaps with the new InfoStream class (with its 'component' argument) this will be improved.
        Hide
        Robert Muir added a comment -

        My problem is this isn't logging but debugging. And since we are a library, we don't need to enforce logging on anyone.

        Otherwise we have to deal with overcomplicated logging frameworks, and people will start opening issues about what is/isn't logged.
        I don't think lucene should be responsible for logging, we should just leave that to the users to implement their own logging.

        Infostream is just an internal thing for debugging and testing, and we already have it hooked nicely into the tests now
        (note: since fixing this stuff I removed all the if (VERBOSE) setInfoStream... across tons of tests and added
        -Dtests.infostream to control it across the board, set automatically by tests.verbose but can be controlled separately)

        But this being said, I am for uwe's improvement because it makes it possible for someone to turn on/off infostream "live" themselves,
        without taking any performance hit from the calculations if they have it disabled, and they are responsible for their own
        implementation (if they want to make stuff volatile or synced, good for them). The complexity is all out of indexwriter's
        classes to manage this "liveness", and that was the whole reason for s/printstream/infostream/ in the first place.

        Show
        Robert Muir added a comment - My problem is this isn't logging but debugging. And since we are a library, we don't need to enforce logging on anyone. Otherwise we have to deal with overcomplicated logging frameworks, and people will start opening issues about what is/isn't logged. I don't think lucene should be responsible for logging, we should just leave that to the users to implement their own logging. Infostream is just an internal thing for debugging and testing, and we already have it hooked nicely into the tests now (note: since fixing this stuff I removed all the if (VERBOSE) setInfoStream... across tons of tests and added -Dtests.infostream to control it across the board, set automatically by tests.verbose but can be controlled separately) But this being said, I am for uwe's improvement because it makes it possible for someone to turn on/off infostream "live" themselves, without taking any performance hit from the calculations if they have it disabled, and they are responsible for their own implementation (if they want to make stuff volatile or synced, good for them). The complexity is all out of indexwriter's classes to manage this "liveness", and that was the whole reason for s/printstream/infostream/ in the first place.
        Hide
        Hoss Man added a comment -

        From a practical usage standpoint, i think it makes a lot more sense to add minor enhancements to the "infoStream" concept then it would to start using JUL or SLF4J (or any other logging API) in the core Lucene code.

        with infoStream, the default assumption has always been (and with this patch: seems to continue to be) that you want "fast" production code w/o the overhead of debugging info. the client app has to go out of the way to turn on the debugging info from lucene, recognizing that that may slow things down.

        I don't know of any logging API we could use where that default assumption would remain the status quo – JUL, JCL, L4J, SLF4J, etc... if we switched to using any of those the burden would be on client libraries to actively disable messages coming from Lucene either by adding a logging config file, or by updating their existing logging config file, or adding a NOOP SLF4J impl, etc... (even JCL defaults to writting to System.err instead of being a NOOP!)

        So lets keep simple/common things simple/common/fast – and people who want to hook into a logging framework can do that.

        Hell: since JUL is guaranteed to be implemented by all JVMs, and since we already have the Jar for the SLF4J API in the build system for Solr, it would be trivial to offer out of the box JUL and SLF4J backed implementations of the InfoStream API to make it really trivial for client apps to use them

        Show
        Hoss Man added a comment - From a practical usage standpoint, i think it makes a lot more sense to add minor enhancements to the "infoStream" concept then it would to start using JUL or SLF4J (or any other logging API) in the core Lucene code. with infoStream, the default assumption has always been (and with this patch: seems to continue to be) that you want "fast" production code w/o the overhead of debugging info. the client app has to go out of the way to turn on the debugging info from lucene, recognizing that that may slow things down. I don't know of any logging API we could use where that default assumption would remain the status quo – JUL, JCL, L4J, SLF4J, etc... if we switched to using any of those the burden would be on client libraries to actively disable messages coming from Lucene either by adding a logging config file, or by updating their existing logging config file, or adding a NOOP SLF4J impl, etc... (even JCL defaults to writting to System.err instead of being a NOOP!) So lets keep simple/common things simple/common/fast – and people who want to hook into a logging framework can do that. Hell: since JUL is guaranteed to be implemented by all JVMs, and since we already have the Jar for the SLF4J API in the build system for Solr, it would be trivial to offer out of the box JUL and SLF4J backed implementations of the InfoStream API to make it really trivial for client apps to use them
        Hide
        Uwe Schindler added a comment -

        @Hoss: Good idea, we can create JUL+SLF4J impls under contrib/misc with a compile-time dependency to SLF4J.

        I think the current patch is fine so I can commit it before it gets outdated? We can improve logging later by adding more classes using InfoStream but thats out of the scope of this issue.

        Show
        Uwe Schindler added a comment - @Hoss: Good idea, we can create JUL+SLF4J impls under contrib/misc with a compile-time dependency to SLF4J. I think the current patch is fine so I can commit it before it gets outdated? We can improve logging later by adding more classes using InfoStream but thats out of the scope of this issue.
        Hide
        Uwe Schindler added a comment -

        Committed trunk revision: 1209059

        For improvements like JUL/SLF4J convenience impl I will open other issues.

        Show
        Uwe Schindler added a comment - Committed trunk revision: 1209059 For improvements like JUL/SLF4J convenience impl I will open other issues.
        Hide
        Michael McCandless added a comment -

        I noticed a few accidental changes here (lost an important !success);
        I fixed those (rev 1209663), but while fixing them I noticed.... we
        are inconsistent on when we call infoStream.isEnabled? Sometimes we
        call it, sometimes we don't.

        I think we should always call it? I don't like that there's a sudden
        inconsistency (it was consistent before), and that now present &
        future coders working on our code must now wonder/think about when to
        check and when not to. I think the rule should be very simple:
        always call .isEnabled.

        And I don't think we should assume it's "cheap" to always call
        infoStream.message (ie, even when it's a single static string); the
        logger could somehow be costly, and had implemented isEnabled exactly
        for this reason (so that it's "free" to have a costly
        message impl, eg maybe sync'd or something).

        Once we fix this... then we can easily test for it by having
        NullInfoStream validate it's never called when it wasn't enabled.

        Show
        Michael McCandless added a comment - I noticed a few accidental changes here (lost an important !success); I fixed those (rev 1209663), but while fixing them I noticed.... we are inconsistent on when we call infoStream.isEnabled? Sometimes we call it, sometimes we don't. I think we should always call it? I don't like that there's a sudden inconsistency (it was consistent before), and that now present & future coders working on our code must now wonder/think about when to check and when not to. I think the rule should be very simple: always call .isEnabled. And I don't think we should assume it's "cheap" to always call infoStream.message (ie, even when it's a single static string); the logger could somehow be costly, and had implemented isEnabled exactly for this reason (so that it's "free" to have a costly message impl, eg maybe sync'd or something). Once we fix this... then we can easily test for it by having NullInfoStream validate it's never called when it wasn't enabled.
        Hide
        Uwe Schindler added a comment - - edited

        Hi Mike,
        sorry for the (!sucess) bugs, I have seen the patch.

        I think we should always call it? I don't like that there's a sudden
        inconsistency (it was consistent before), and that now present &
        future coders working on our code must now wonder/think about when to
        check and when not to. I think the rule should be very simple:
        always call .isEnabled.

        The idea behind InfoStream interface was to mimic logging frameworks, which accept calls to the log methods always. Only if the string creation is expensive, you may call isXxxEnabled().

        But I have no preference, if you like change it to require a call to isEnabled, do it. NullInfoStream has isEnabled() returning true (as we use it to actually test that all logging code actually works). But the singleton implementation InfoStream.NO_OUTPUT (the default setting) returns false in isEnabled() so you can add an "assert false" in its message() method.

        Did you once again go through the patch I committed and checked, trat no other bugs were introduced?

        Show
        Uwe Schindler added a comment - - edited Hi Mike, sorry for the (!sucess) bugs, I have seen the patch. I think we should always call it? I don't like that there's a sudden inconsistency (it was consistent before), and that now present & future coders working on our code must now wonder/think about when to check and when not to. I think the rule should be very simple: always call .isEnabled. The idea behind InfoStream interface was to mimic logging frameworks, which accept calls to the log methods always. Only if the string creation is expensive, you may call isXxxEnabled(). But I have no preference, if you like change it to require a call to isEnabled, do it. NullInfoStream has isEnabled() returning true (as we use it to actually test that all logging code actually works). But the singleton implementation InfoStream.NO_OUTPUT (the default setting) returns false in isEnabled() so you can add an "assert false" in its message() method. Did you once again go through the patch I committed and checked, trat no other bugs were introduced?
        Hide
        Michael McCandless added a comment -

        Patch, always calling .isEnabled before .message, and adding a new test-only AlwaysDisabledInfoStream, swapped in randomly 1/3rd of the time, that returns false always from isEnabled and then asserts false in its message() method.

        Show
        Michael McCandless added a comment - Patch, always calling .isEnabled before .message, and adding a new test-only AlwaysDisabledInfoStream, swapped in randomly 1/3rd of the time, that returns false always from isEnabled and then asserts false in its message() method.
        Hide
        Uwe Schindler added a comment -

        Mike, you dont need the AlwaysDisabledIS.

        Just add the assert in InfoStream.NO_OUTPUT

        Show
        Uwe Schindler added a comment - Mike, you dont need the AlwaysDisabledIS. Just add the assert in InfoStream.NO_OUTPUT
        Hide
        Michael McCandless added a comment -

        Aha! Excellent... I'll do that.

        Show
        Michael McCandless added a comment - Aha! Excellent... I'll do that.

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Uwe Schindler
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development