Solr
  1. Solr
  2. SOLR-1145

Patch to set IndexWriter.defaultInfoStream from solr.xml

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4
    • Component/s: None
    • Labels:
      None

      Description

      Lucene IndexWriters use an infoStream to log detailed info about indexing operations for debugging purpose. This patch is an extremely simple way to allow logging this info to a file from within Solr: After applying the patch, set the new "defaultInfoStreamFilePath" attribute of the solr element in solr.xml to the path of the file where you'd like to save the logging information.

      Note that, in a multi-core setup, all cores will end up logging to the same infoStream log file. This may not be desired. (But it does justify putting the setting in solr.xml rather than solrconfig.xml.)

      1. SOLR-1145.patch
        5 kB
        Mark Miller
      2. SOLR-1145.patch
        4 kB
        Mark Miller
      3. SOLR-1145.patch
        4 kB
        Chris Harris
      4. SOLR-1145.patch
        1 kB
        Chris Harris

        Activity

        Hide
        Yonik Seeley added a comment -

        This would be very useful to get in for 1.4 to help with indexing issues people may hit.

        Show
        Yonik Seeley added a comment - This would be very useful to get in for 1.4 to help with indexing issues people may hit.
        Hide
        Chris Harris added a comment -

        I found it confusing not to have any timestamp data in the infostream log. This new version prints a timestamp along with each infoStream message. You can configure the timestamp format, though there's currently no way to disable timestamping.

        Show
        Chris Harris added a comment - I found it confusing not to have any timestamp data in the infostream log. This new version prints a timestamp along with each infoStream message. You can configure the timestamp format, though there's currently no way to disable timestamping.
        Hide
        Chris Harris added a comment -

        Oops. Infostream logging should not be enabled by default in example/multicore

        Show
        Chris Harris added a comment - Oops. Infostream logging should not be enabled by default in example/multicore
        Hide
        Mark Miller added a comment -

        Couple thoughts:

        Great idea - lets put this in for sure.

        I think the config actually does belong in solrconfig.xml though. I also think we should not be using the static setDefaultInfostream, but rather the per instance setting. Then you can set a debug info file per core.

        Show
        Mark Miller added a comment - Couple thoughts: Great idea - lets put this in for sure. I think the config actually does belong in solrconfig.xml though. I also think we should not be using the static setDefaultInfostream, but rather the per instance setting. Then you can set a debug info file per core.
        Hide
        Otis Gospodnetic added a comment -

        I agree about this belonging to solrconfig.xml – I bet < 50% people use multicore.

        Show
        Otis Gospodnetic added a comment - I agree about this belonging to solrconfig.xml – I bet < 50% people use multicore.
        Hide
        Mark Miller added a comment -

        So I think it should be more like this - per core settings.

        Or perhaps a mix of our two patches?

        Show
        Mark Miller added a comment - So I think it should be more like this - per core settings. Or perhaps a mix of our two patches?
        Hide
        Chris Harris added a comment -

        So I think it should be more like this - per core settings.

        That works fine for me. I didn't actually think doing it per core would be that easy, which is part of why I started out with the process-wide approach.

        Any ideas about if the TimeLoggingPrintStream is going to slow things down too much? I didn't do any kind of benchmarking with that.

        Show
        Chris Harris added a comment - So I think it should be more like this - per core settings. That works fine for me. I didn't actually think doing it per core would be that easy, which is part of why I started out with the process-wide approach. Any ideas about if the TimeLoggingPrintStream is going to slow things down too much? I didn't do any kind of benchmarking with that.
        Hide
        Hoss Man added a comment -

        adding support for setInfoStream seems like a good idea, but specifying a filename in the config file and writing a new TimeLoggingPrintStream class smells like reinventing logging.

        It seems like it would be simpler just to add a new boolean "infoStream" config option, and then leverage the existing logging framework by constructing an instance of a PrintStream that just proxies to SLF4J when the boolean property is true. A PrintStream instance that writes to a Logger seems no harder to implement then an PrintStream that knows how to prepend messages with a timestamp – and by using the standard logging framework, solr admins can customize it however they want.

        It looks like someone may have already thought this type of problem through a lot more indepth then us anyway...
        http://bugzilla.slf4j.org/show_bug.cgi?id=110

        Show
        Hoss Man added a comment - adding support for setInfoStream seems like a good idea, but specifying a filename in the config file and writing a new TimeLoggingPrintStream class smells like reinventing logging. It seems like it would be simpler just to add a new boolean "infoStream" config option, and then leverage the existing logging framework by constructing an instance of a PrintStream that just proxies to SLF4J when the boolean property is true. A PrintStream instance that writes to a Logger seems no harder to implement then an PrintStream that knows how to prepend messages with a timestamp – and by using the standard logging framework, solr admins can customize it however they want. It looks like someone may have already thought this type of problem through a lot more indepth then us anyway... http://bugzilla.slf4j.org/show_bug.cgi?id=110
        Hide
        Mark Miller added a comment - - edited

        I see your point Hoss.

        But at the same time, this is a special debug log that I would likely want to see only when I specifically turn it on to debug (usually for a Lucene dev to look at). Likely I just want to see the log in isolation - Mike M will prob have a hard time looking through it mixed in with all of the Solr logging

        Now you can do that with log4j/jdk logging, but its a hassle. Youd have to setup a config that captured the right package, and setup a file for it to write to... and it would likely be a hassle for people when I said, hey just turn on the IndexWriter debug stream, and setup your logging config to isolate it to a single file, and ...

        And it just seems like you would almost never want the info dispersed with everything else in the standard log.

        TimeLoggingPrintStream isnt really to reinvent logging - its just to shove in a timestamp as I guess Lucene doesn't.

        So, I see your point, and I'm not really arguing against yet (Likely you have good counter points), but thats the way I see it at the moment...

        Show
        Mark Miller added a comment - - edited I see your point Hoss. But at the same time, this is a special debug log that I would likely want to see only when I specifically turn it on to debug (usually for a Lucene dev to look at). Likely I just want to see the log in isolation - Mike M will prob have a hard time looking through it mixed in with all of the Solr logging Now you can do that with log4j/jdk logging, but its a hassle. Youd have to setup a config that captured the right package, and setup a file for it to write to... and it would likely be a hassle for people when I said, hey just turn on the IndexWriter debug stream, and setup your logging config to isolate it to a single file, and ... And it just seems like you would almost never want the info dispersed with everything else in the standard log. TimeLoggingPrintStream isnt really to reinvent logging - its just to shove in a timestamp as I guess Lucene doesn't. So, I see your point, and I'm not really arguing against yet (Likely you have good counter points), but thats the way I see it at the moment...
        Hide
        Mark Miller added a comment -

        Also, with lots of cores, even if you used the logging config to isolate the infostream log, it would be very difficult to isolate each core to its own file. And again, it seems like it would be difficult to deal with the log with multiple cores mixing up the IndexWriter debug logging in the same file.

        Show
        Mark Miller added a comment - Also, with lots of cores, even if you used the logging config to isolate the infostream log, it would be very difficult to isolate each core to its own file. And again, it seems like it would be difficult to deal with the log with multiple cores mixing up the IndexWriter debug logging in the same file.
        Hide
        Hoss Man added a comment -

        Mark: all your points are valid.

        Do me one favor: instead of this...

        <infoStreamFile>/path/to/infostream/file</infoStreamFile>
        

        Make the config sequence something like...

        <infoStream file="/path/to/infostream/file" />
        

        ...so we don't paint ourselves into a corner, and we can easily add logging based infostream option later if we want. (using a new 'logger="..."' xml attribute instead of 'file')

        On a related note, Having an explicit true/false (ie: on/off) option seems like a good idea so it's easy to control with a system property at startup....

        <infoStream file="/path/to/infostream/file">${solr.infostream:true}</infostream>
        
        Show
        Hoss Man added a comment - Mark: all your points are valid. Do me one favor: instead of this... <infoStreamFile>/path/to/infostream/file</infoStreamFile> Make the config sequence something like... <infoStream file= "/path/to/infostream/file" /> ...so we don't paint ourselves into a corner, and we can easily add logging based infostream option later if we want. (using a new 'logger="..."' xml attribute instead of 'file') On a related note, Having an explicit true/false (ie: on/off) option seems like a good idea so it's easy to control with a system property at startup.... <infoStream file= "/path/to/infostream/file" >${solr.infostream: true }</infostream>
        Hide
        Mark Miller added a comment -

        Changes made plus some other polish/finish

        Show
        Mark Miller added a comment - Changes made plus some other polish/finish
        Hide
        Noble Paul added a comment -

        I am moving all the xpath reading to solrconfig. I guess this too can go to solrconfig. SOLR-1198

        Show
        Noble Paul added a comment - I am moving all the xpath reading to solrconfig. I guess this too can go to solrconfig. SOLR-1198
        Hide
        Mark Miller added a comment -

        +1

        Show
        Mark Miller added a comment - +1
        Hide
        Mark Miller added a comment -

        Thanks Chris!

        Show
        Mark Miller added a comment - Thanks Chris!
        Show
        Yonik Seeley added a comment - Reopening... I can't get it to work either. http://search.lucidimagination.com/search/document/9cd31d75526236f1/indexwriter_infostream_in_solrconfig_not_working
        Hide
        Yonik Seeley added a comment -

        OK, I found it... the example has infoStream in the indexDefaults section and not the mainIndex section.

        Not sure if intoStream should be supported as a default or not.
        Perhaps just move the example to mainIndex for now?

        Show
        Yonik Seeley added a comment - OK, I found it... the example has infoStream in the indexDefaults section and not the mainIndex section. Not sure if intoStream should be supported as a default or not. Perhaps just move the example to mainIndex for now?
        Hide
        Yonik Seeley added a comment -

        committed changes to example, as well as a NPE fix for files without a parent specified.

        Show
        Yonik Seeley added a comment - committed changes to example, as well as a NPE fix for files without a parent specified.
        Hide
        Grant Ingersoll added a comment -

        Bulk close for Solr 1.4

        Show
        Grant Ingersoll added a comment - Bulk close for Solr 1.4

          People

          • Assignee:
            Mark Miller
            Reporter:
            Chris Harris
          • Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development