Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: None
    • Component/s: search
    • Labels:
      None

      Description

      adds a logger to log handler, query string and hit counts for each query

      1. SOLR-267.patch
        18 kB
        Grant Ingersoll
      2. SOLR-267.patch
        18 kB
        Grant Ingersoll
      3. SOLR-267.patch
        3 kB
        Grant Ingersoll
      4. LogQueryHitCounts.patch
        2 kB
        Will Johnson
      5. LogQueryHitCounts.patch
        1 kB
        Will Johnson
      6. LogQueryHitCounts.patch
        1 kB
        Yonik Seeley
      7. LogQueryHitCounts.patch
        2 kB
        Will Johnson
      8. LogQueryHitCounts.patch
        5 kB
        Will Johnson
      9. LogQueryHitCounts.patch
        5 kB
        Will Johnson
      10. LogQueryHitCounts.patch
        7 kB
        Will Johnson

        Issue Links

          Activity

          Hide
          Will Johnson added a comment -

          hit a random key a little fast on the last post. the attached patch adds a logger to the Standard and DisMax request handlers to log the hander name, query string and hit count for each query.

          Show
          Will Johnson added a comment - hit a random key a little fast on the last post. the attached patch adds a logger to the Standard and DisMax request handlers to log the hander name, query string and hit count for each query.
          Hide
          Yonik Seeley added a comment -

          I think in the future we could handle this in a more generic manner, but for now I'd support a more generic version implemented in SolrCore.execute()
          It could check for a DocList in element #1 and if so, use that.
          That would be faster than a separate logging statement, and the query (and all other parameters passed in) are already logged in execute().

          I think we need a separate key=value section for results... and the number of results would appear as "hits=941"

          Show
          Yonik Seeley added a comment - I think in the future we could handle this in a more generic manner, but for now I'd support a more generic version implemented in SolrCore.execute() It could check for a DocList in element #1 and if so, use that. That would be faster than a separate logging statement, and the query (and all other parameters passed in) are already logged in execute(). I think we need a separate key=value section for results... and the number of results would appear as "hits=941"
          Hide
          Will Johnson added a comment -

          updated patch to work in SolrCore.execute() instead. i annotated the msg with hits=## as requested however i left time unlabeled for backwards compatibility and i had no idea what the static '0' was but i left it there just to be safe as well. i think i tmight be good to clean that up and i'm happy to but i don't know who or how those numbers are being used today.

          Show
          Will Johnson added a comment - updated patch to work in SolrCore.execute() instead. i annotated the msg with hits=## as requested however i left time unlabeled for backwards compatibility and i had no idea what the static '0' was but i left it there just to be safe as well. i think i tmight be good to clean that up and i'm happy to but i don't know who or how those numbers are being used today.
          Hide
          Yonik Seeley added a comment -

          The static "0" is return code... no exception was thrown if we are here, so it was a success.

          I think the last thing should remain the request time for back compatibility and easy scanning.
          The result parameters should go after the input parameters, separated by some sort of delimiter
          to be nice to machine parsers.

          Show
          Yonik Seeley added a comment - The static "0" is return code... no exception was thrown if we are here, so it was a success. I think the last thing should remain the request time for back compatibility and easy scanning. The result parameters should go after the input parameters, separated by some sort of delimiter to be nice to machine parsers.
          Hide
          Yonik Seeley added a comment -

          Attaching version with small changes...

          This produces log messages that look like this:
          INFO: /select q=solr&wt=python&indent=on hits=1 0 94

          If there was no DocSet, it would look like this:
          INFO: /select q=solr&wt=python&indent=on - 0 94

          The '-' is a placeholder so the number of fields will be the same for different requests.

          I checked, and partialUrlEncode does encode spaces, so a space separator in the log is sufficient.

          Show
          Yonik Seeley added a comment - Attaching version with small changes... This produces log messages that look like this: INFO: /select q=solr&wt=python&indent=on hits=1 0 94 If there was no DocSet, it would look like this: INFO: /select q=solr&wt=python&indent=on - 0 94 The '-' is a placeholder so the number of fields will be the same for different requests. I checked, and partialUrlEncode does encode spaces, so a space separator in the log is sufficient.
          Hide
          Will Johnson added a comment -

          slight update to log the webapp name which is set in the SolrDispatchFilter. this lets you distinguish between multiple solr instances for tracking purposes.

          log output now looks like:

          Jun 21, 2007 10:31:05 AM org.apache.solr.core.SolrCore execute
          INFO: /solr /select/ indent=on&start=0&q=:&version=2.2&rows=10 hits=5 0 62

          Show
          Will Johnson added a comment - slight update to log the webapp name which is set in the SolrDispatchFilter. this lets you distinguish between multiple solr instances for tracking purposes. log output now looks like: Jun 21, 2007 10:31:05 AM org.apache.solr.core.SolrCore execute INFO: /solr /select/ indent=on&start=0&q= : &version=2.2&rows=10 hits=5 0 62
          Hide
          Hoss Man added a comment -

          I'm really not a fan of this approach to logging the number of hits via SolrCore.execute ... this is request handler specific information, it should be logged by the request handler.

          if people really want this type of stuff to be logged by SolrCore execute so that there's only one log message with the timing info and status and such, then let's come up with a way for the RequestHandler to explicitly specify in the SolrQueryResponse the key-val pairs it wants logged (either via new methods on SOlrQueryResponse, or via apecific name at the top level of the response) and modify the out-of-the-box request handlers to take advantage of this

          instead of...

          + String resultLog = " -";
          + if (rsp.getValues().size()>1 && rsp.getValues().getVal(1) instanceof DocList)

          { + int hits = ((DocList) rsp.getValues().getVal(1)).size(); + resultLog = " hits=" + hits; + }

          ...something like...

          StringBuffer resultLog = new StringBuffer(" -");
          Map<String,Object> logables = rsp.getValues().get("loginfo");
          for (String k : logables.keySet())

          { resultLog.append(" " + k + "=" + logables.get(k)); }
          Show
          Hoss Man added a comment - I'm really not a fan of this approach to logging the number of hits via SolrCore.execute ... this is request handler specific information, it should be logged by the request handler. if people really want this type of stuff to be logged by SolrCore execute so that there's only one log message with the timing info and status and such, then let's come up with a way for the RequestHandler to explicitly specify in the SolrQueryResponse the key-val pairs it wants logged (either via new methods on SOlrQueryResponse, or via apecific name at the top level of the response) and modify the out-of-the-box request handlers to take advantage of this instead of... + String resultLog = " -"; + if (rsp.getValues().size()>1 && rsp.getValues().getVal(1) instanceof DocList) { + int hits = ((DocList) rsp.getValues().getVal(1)).size(); + resultLog = " hits=" + hits; + } ...something like... StringBuffer resultLog = new StringBuffer(" -"); Map<String,Object> logables = rsp.getValues().get("loginfo"); for (String k : logables.keySet()) { resultLog.append(" " + k + "=" + logables.get(k)); }
          Hide
          Will Johnson added a comment -

          One thing that comes to mind is making the response header a standard
          part of the SolrQueryResponse object with get/set/add methods then the
          log message can just print out what ever is in the response header. With
          trunk, it already contains much of the same data (status, qtime,
          params). The only issue is that in order to keep things 'clean' the
          output would change to being fully labeled:

          webapp=/solr path=/select/ status=0 QTime=63
          params=

          {indent=on,start=0,q=*:*,version=2.2,rows=10}

          myotherheader=foo

          In general I think this makes things much cleaner and easier to read but
          it does break backwards compatibility for log parsing purposes. Any
          other ideas?

          • will
          Show
          Will Johnson added a comment - One thing that comes to mind is making the response header a standard part of the SolrQueryResponse object with get/set/add methods then the log message can just print out what ever is in the response header. With trunk, it already contains much of the same data (status, qtime, params). The only issue is that in order to keep things 'clean' the output would change to being fully labeled: webapp=/solr path=/select/ status=0 QTime=63 params= {indent=on,start=0,q=*:*,version=2.2,rows=10} myotherheader=foo In general I think this makes things much cleaner and easier to read but it does break backwards compatibility for log parsing purposes. Any other ideas? will
          Hide
          Will Johnson added a comment -

          new patch to promote responseHeader from a defacto standard to an api standard in SolrQueryResponse. this enables the SolrCore.execute() method to simply print out it's contents containing any info people want logged. the format now is:

          Jun 22, 2007 10:37:25 AM org.apache.solr.core.SolrCore execute
          INFO: webapp=/solr path=/select/ hits=0 status=0 QTime=0 params=

          {indent=on,start=0,q=solr,version=2.2,rows=10}

          which is fully labeled but i think mkaes things much easier to read/parse as you can look for labels instead of positions which may or may not change.

          Show
          Will Johnson added a comment - new patch to promote responseHeader from a defacto standard to an api standard in SolrQueryResponse. this enables the SolrCore.execute() method to simply print out it's contents containing any info people want logged. the format now is: Jun 22, 2007 10:37:25 AM org.apache.solr.core.SolrCore execute INFO: webapp=/solr path=/select/ hits=0 status=0 QTime=0 params= {indent=on,start=0,q=solr,version=2.2,rows=10} which is fully labeled but i think mkaes things much easier to read/parse as you can look for labels instead of positions which may or may not change.
          Hide
          Ian Holsman added a comment -

          a couple of comments on this patch.

          1. how is this going to look when the query is ~200-1000 characters long? you might need/want to put in some quotes are the query.

          2. is it possible to put the metrics into the response header so that tomcat's logging will be able to grab it (similar to how SOLR-232 puts it in)

          Show
          Ian Holsman added a comment - a couple of comments on this patch. 1. how is this going to look when the query is ~200-1000 characters long? you might need/want to put in some quotes are the query. 2. is it possible to put the metrics into the response header so that tomcat's logging will be able to grab it (similar to how SOLR-232 puts it in)
          Hide
          Yonik Seeley added a comment -

          I liked the params that I could paste back into my browser...

          Show
          Yonik Seeley added a comment - I liked the params that I could paste back into my browser...
          Hide
          Yonik Seeley added a comment -

          > I'm really not a fan of this approach to logging the number of hits via SolrCore.execute ... this is request handler specific information, it should be logged by the request handler.

          There is something to be said for automatically logging certain things for all request handlers in a certain category (those that produce a "base" DocList or DocSet) rather than requiring each handler to worry about logging code also.

          The implementation is slightly hackish at this point, but not too bad. If/when we go to a QueryComponent chain, it would be cleaner to get "hits".

          Show
          Yonik Seeley added a comment - > I'm really not a fan of this approach to logging the number of hits via SolrCore.execute ... this is request handler specific information, it should be logged by the request handler. There is something to be said for automatically logging certain things for all request handlers in a certain category (those that produce a "base" DocList or DocSet) rather than requiring each handler to worry about logging code also. The implementation is slightly hackish at this point, but not too bad. If/when we go to a QueryComponent chain, it would be cleaner to get "hits".
          Hide
          Will Johnson added a comment -

          A few response rolled up:

          Yonik Seeley commented on SOLR-267:
          -----------------------------------

          After having used this for a ~week now I kind of do too. I can work on
          a patch that switches that log component back unless someone else (who
          wants it more) beats me to it.

          "hits".

          Agreed, I'd love to have query pipelines and indexing pipelines for
          processing logic but that's a much bigger effort. At the moment it's
          only 1 line extra in each of the 'real' query handlers which doesn't
          seem too bad.

          Ian Holsman commented on SOLR-267:
          ----------------------------------

          long? >you might need/want to put in some quotes are the query.

          It will look very long As long as there are no spaces which the url
          encoding should handle I think things are ok (this assumes we're going
          to switch back to cgi params)

          it >in)

          Not that I know how to do. Since the dispatch filter is a filter not a
          servlet it doesn't have access to an HttpServletResponse, only a
          ServletResponse which means it can't set HttpHeaders. This was my
          original idea for how to solve this problem and seems a bit more
          'standard' anyways but I hit a dead end without getting more hackish
          than usual.

          • will
          Show
          Will Johnson added a comment - A few response rolled up: Yonik Seeley commented on SOLR-267 : ----------------------------------- After having used this for a ~week now I kind of do too. I can work on a patch that switches that log component back unless someone else (who wants it more) beats me to it. "hits". Agreed, I'd love to have query pipelines and indexing pipelines for processing logic but that's a much bigger effort. At the moment it's only 1 line extra in each of the 'real' query handlers which doesn't seem too bad. Ian Holsman commented on SOLR-267 : ---------------------------------- long? >you might need/want to put in some quotes are the query. It will look very long As long as there are no spaces which the url encoding should handle I think things are ok (this assumes we're going to switch back to cgi params) it >in) Not that I know how to do. Since the dispatch filter is a filter not a servlet it doesn't have access to an HttpServletResponse, only a ServletResponse which means it can't set HttpHeaders. This was my original idea for how to solve this problem and seems a bit more 'standard' anyways but I hit a dead end without getting more hackish than usual. will
          Hide
          Will Johnson added a comment -

          new patch that writes out request params as cgi instead of NL.toString() for pasting into a browser. i also figured out the HttpResponseHeader however i'm not sure how people feel about having that info duplicated in teh solr logs, the http headers/access logs and the actual solr response. in any case the following logic would go into SolrDispatchFilter: (but is not in this patch)

          Show
          Will Johnson added a comment - new patch that writes out request params as cgi instead of NL.toString() for pasting into a browser. i also figured out the HttpResponseHeader however i'm not sure how people feel about having that info duplicated in teh solr logs, the http headers/access logs and the actual solr response. in any case the following logic would go into SolrDispatchFilter: (but is not in this patch)
          Hide
          Yonik Seeley added a comment -

          The double logging issue with the new update handler framework should be fixed with whatever mechanism is created here.

          Using the responseHeader as logging info is clever, but I'm not sure if we want to be bound to return everything in the response that we want to be logged (for updates, some may want all of the ids logged but not returned, and if they are returned, not in the header).

          So, what about either a "NamedList toLog;"on the SolrQueryResponse, or SolrQueryRequest.getContext().get("log") => NamedList ?

          > new patch to promote responseHeader from a defacto standard to an api standard in SolrQueryResponse.

          I think that's probably OK

          Show
          Yonik Seeley added a comment - The double logging issue with the new update handler framework should be fixed with whatever mechanism is created here. Using the responseHeader as logging info is clever, but I'm not sure if we want to be bound to return everything in the response that we want to be logged (for updates, some may want all of the ids logged but not returned, and if they are returned, not in the header). So, what about either a "NamedList toLog;"on the SolrQueryResponse, or SolrQueryRequest.getContext().get("log") => NamedList ? > new patch to promote responseHeader from a defacto standard to an api standard in SolrQueryResponse. I think that's probably OK
          Hide
          Will Johnson added a comment -

          new path produces the following output:

          Jul 11, 2007 1:35:19 PM org.apache.solr.core.SolrCore execute
          INFO: webapp=/solr path=/select/ params=indent=on&start=0&q=solr&version=2.2&rows=10 hits=0 status=0 QTime=79

          and adds a NamedList toLog as yonik suggested.

          Show
          Will Johnson added a comment - new path produces the following output: Jul 11, 2007 1:35:19 PM org.apache.solr.core.SolrCore execute INFO: webapp=/solr path=/select/ params=indent=on&start=0&q=solr&version=2.2&rows=10 hits=0 status=0 QTime=79 and adds a NamedList toLog as yonik suggested.
          Hide
          Yonik Seeley added a comment -

          So what do people think of the logging format? Any objections?

          Show
          Yonik Seeley added a comment - So what do people think of the logging format? Any objections?
          Hide
          Hoss Man added a comment -

          format looks fine to me, some minor nits from a quick skim of the patch...

          1) "protected NamedList toLog" should have some javadocs.

          2) "public NamedList getResponseHeader()" seems to have incorrect javadocs

          3) I'm pretty sure the use HttpServletResponse.addHeader in SolrDispatchFilter requires that the header names not contains spaces ... that's not something currently guaranteed by solrRsp.getResponseHeader()

          4) although the odds of the ClassCastException in SolrDispatchFilter asymptotically approach zero, it should still be logged properly instead of using cce.printStackTrace()

          Show
          Hoss Man added a comment - format looks fine to me, some minor nits from a quick skim of the patch... 1) "protected NamedList toLog" should have some javadocs. 2) "public NamedList getResponseHeader()" seems to have incorrect javadocs 3) I'm pretty sure the use HttpServletResponse.addHeader in SolrDispatchFilter requires that the header names not contains spaces ... that's not something currently guaranteed by solrRsp.getResponseHeader() 4) although the odds of the ClassCastException in SolrDispatchFilter asymptotically approach zero, it should still be logged properly instead of using cce.printStackTrace()
          Hide
          Grant Ingersoll added a comment -

          Updated the patch to run using the new SearchComponent infrastructure. Also, made a slight change about how params are output so that it is more clear what the params are due to = being both the key/value separator for the log message and the key/value separator for the params.

          Plan to commit tonight or tomorrow.

          Show
          Grant Ingersoll added a comment - Updated the patch to run using the new SearchComponent infrastructure. Also, made a slight change about how params are output so that it is more clear what the params are due to = being both the key/value separator for the log message and the key/value separator for the params. Plan to commit tonight or tomorrow.
          Hide
          Yonik Seeley added a comment -

          > Also, made a slight change about how params are output so that it is more clear what the params are

          Can you give an example of what the output looks like now (and what impact multi-core has, if any)?

          Show
          Yonik Seeley added a comment - > Also, made a slight change about how params are output so that it is more clear what the params are Can you give an example of what the output looks like now (and what impact multi-core has, if any)?
          Hide
          Grant Ingersoll added a comment -

          Can you give an example of what the output looks like now (and what impact multi-core has, if any)?

          INFO: webapp=/solr path=/select/ params=

          {indent=on&rows=10&start=0&q=ipod&version=2.2}

          hits=3 status=0 QTime=8

          I will have to try w/ MultiCore

          Show
          Grant Ingersoll added a comment - Can you give an example of what the output looks like now (and what impact multi-core has, if any)? INFO: webapp=/solr path=/select/ params= {indent=on&rows=10&start=0&q=ipod&version=2.2} hits=3 status=0 QTime=8 I will have to try w/ MultiCore
          Hide
          Grant Ingersoll added a comment -

          Here's what happens on multicore:

          Apr 16, 2008 5:40:29 PM org.apache.solr.core.SolrCore execute
          INFO: webapp=/solr path=/select/ params={indent=on&rows=10&start=0&q=solr&version=2.2} hits=0 status=0 QTime=39 
          Apr 16, 2008 5:40:29 PM org.apache.solr.core.SolrCore execute
          INFO: [core0] /select/ indent=on&rows=10&start=0&q=solr&version=2.2 0 39
          
          Show
          Grant Ingersoll added a comment - Here's what happens on multicore: Apr 16, 2008 5:40:29 PM org.apache.solr.core.SolrCore execute INFO: webapp=/solr path=/select/ params={indent=on&rows=10&start=0&q=solr&version=2.2} hits=0 status=0 QTime=39 Apr 16, 2008 5:40:29 PM org.apache.solr.core.SolrCore execute INFO: [core0] /select/ indent=on&rows=10&start=0&q=solr&version=2.2 0 39
          Hide
          Grant Ingersoll added a comment -

          Update to remove double logging in SolrCore. Should now properly handle multicore, etc.

          Show
          Grant Ingersoll added a comment - Update to remove double logging in SolrCore. Should now properly handle multicore, etc.
          Hide
          Grant Ingersoll added a comment -

          Committed revision 649066.

          Show
          Grant Ingersoll added a comment - Committed revision 649066.
          Hide
          Grant Ingersoll added a comment -

          Something isn't quite right with regards to the responseHeader.

          Show
          Grant Ingersoll added a comment - Something isn't quite right with regards to the responseHeader.
          Hide
          Grant Ingersoll added a comment -

          OK, two things:

          1. The current commit adds responseHeader twice

          2. The changes to SolrDispatchFilter can screw up SolrJ when you have explicit=all.

          I don't know enough about how headers work, so I'm going to fix #1 and comment out #2 and put a TODO: there and someone can address it on SOLR-232

          Patch shortly

          Show
          Grant Ingersoll added a comment - OK, two things: 1. The current commit adds responseHeader twice 2. The changes to SolrDispatchFilter can screw up SolrJ when you have explicit=all. I don't know enough about how headers work, so I'm going to fix #1 and comment out #2 and put a TODO: there and someone can address it on SOLR-232 Patch shortly
          Hide
          Grant Ingersoll added a comment -

          Will commit shortly

          Show
          Grant Ingersoll added a comment - Will commit shortly
          Hide
          Grant Ingersoll added a comment -

          Committed revision 649185.

          Show
          Grant Ingersoll added a comment - Committed revision 649185.

            People

            • Assignee:
              Grant Ingersoll
              Reporter:
              Will Johnson
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development