Details

    • Type: Test Test
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      We need custom logging to decipher tests with multiple core containers, cores, in a single JVM.

      1. SOLR-3157.patch
        23 kB
        Yonik Seeley
      2. jetty_threadgroup.patch
        3 kB
        Yonik Seeley

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          IMO, it's actually higher importance that we have better logging for ourselves so we can more easily debug our tests.

          agreed ... but we shouldn't break the log format of the most important line Solr logs if we can avoid it (particularly since the format was chosen specifically to be easy to parse).

          i also don't understand why you think those changes to SolrCore.execute's "info" log make anything better? your change removed really useful info and made the messages harder to parse because it removed the consistent key=val pattern for the path & param vals.

          Maybe there's a way I can log different things when using the test formatter and restore the production log format to it's former glory.

          I don't understand why the structure of the string used in this one info message has anything to do with your goal of better test logging using the SolrLogFormatter? why can't that string format stay exactly identical regardless of whether it's in a test (and the record gets formated with all of your new thread and meatdata goodness) or not ?

          as things stand now (with your latest commit #1294911) these messages are still broken compared to 3x because they don't include the SolrCore name (the "logid" used to init the StringBuilder before you changed it) ... that seems pretty damn important (although i realize now i somehow didn't mention that in my earlier comment)

          3x example...
          
          Feb 28, 2012 5:38:33 PM org.apache.solr.core.SolrCore execute
          INFO: [core0] webapp=/solr path=/select/ params={q=*:*&sort=score+desc} hits=0 status=0 QTime=11 
          
          trunk example...
          
          Feb 28, 2012 5:39:02 PM org.apache.solr.core.SolrCore execute
          INFO: webapp=/solr path=/select/ params={q=*:*&sort=score+desc} hits=0 status=0 QTime=40 
          
          Show
          Hoss Man added a comment - IMO, it's actually higher importance that we have better logging for ourselves so we can more easily debug our tests. agreed ... but we shouldn't break the log format of the most important line Solr logs if we can avoid it (particularly since the format was chosen specifically to be easy to parse). i also don't understand why you think those changes to SolrCore.execute's "info" log make anything better? your change removed really useful info and made the messages harder to parse because it removed the consistent key=val pattern for the path & param vals. Maybe there's a way I can log different things when using the test formatter and restore the production log format to it's former glory. I don't understand why the structure of the string used in this one info message has anything to do with your goal of better test logging using the SolrLogFormatter? why can't that string format stay exactly identical regardless of whether it's in a test (and the record gets formated with all of your new thread and meatdata goodness) or not ? as things stand now (with your latest commit #1294911) these messages are still broken compared to 3x because they don't include the SolrCore name (the "logid" used to init the StringBuilder before you changed it) ... that seems pretty damn important (although i realize now i somehow didn't mention that in my earlier comment) 3x example... Feb 28, 2012 5:38:33 PM org.apache.solr.core.SolrCore execute INFO: [core0] webapp=/solr path=/select/ params={q=*:*&sort=score+desc} hits=0 status=0 QTime=11 trunk example... Feb 28, 2012 5:39:02 PM org.apache.solr.core.SolrCore execute INFO: webapp=/solr path=/select/ params={q=*:*&sort=score+desc} hits=0 status=0 QTime=40
          Hide
          Yonik Seeley added a comment -

          Log format changes seem a lot less important than actual request/response formats?
          IMO, it's actually higher importance that we have better logging for ourselves so we can more easily debug our tests.

          Maybe there's a way I can log different things when using the test formatter and restore the production log format to it's former glory.

          Show
          Yonik Seeley added a comment - Log format changes seem a lot less important than actual request/response formats? IMO, it's actually higher importance that we have better logging for ourselves so we can more easily debug our tests. Maybe there's a way I can log different things when using the test formatter and restore the production log format to it's former glory.
          Hide
          Hoss Man added a comment -

          Yonik: your changes in r1294212 break the SolrCore.execute log format conventions we've had in place since SOLR-267, which breaks some log processing code i have (and since the whole point of SOLR-267 was to make it easy for people to write log parses, i'm guessing i'm not the only one)

          Notably:

          • you changed the "path" and "params" key=val pairs so they no longer include the key, just the val – this doesn't really make sense to me since the whole point of those log lines is that they are suppose to be consistent and easy to parse.
          • you removed the webapp key=val pair completely (comment says "multiple webaps are no longer best practise" but that doesn't mean people don't use them or that we should just suddenly stop logging them.
          Show
          Hoss Man added a comment - Yonik: your changes in r1294212 break the SolrCore.execute log format conventions we've had in place since SOLR-267 , which breaks some log processing code i have (and since the whole point of SOLR-267 was to make it easy for people to write log parses, i'm guessing i'm not the only one) Notably: you changed the "path" and "params" key=val pairs so they no longer include the key, just the val – this doesn't really make sense to me since the whole point of those log lines is that they are suppose to be consistent and easy to parse. you removed the webapp key=val pair completely (comment says "multiple webaps are no longer best practise" but that doesn't mean people don't use them or that we should just suddenly stop logging them.
          Hide
          Yonik Seeley added a comment -

          Regarding jetty: note that we use a SocketConnector (and hence there is no associated visible thread pool we can do anything with). We don't use the NIO connector since it was causing intermittent errors in the tests.

          Show
          Yonik Seeley added a comment - Regarding jetty: note that we use a SocketConnector (and hence there is no associated visible thread pool we can do anything with). We don't use the NIO connector since it was causing intermittent errors in the tests.
          Hide
          Yonik Seeley added a comment -

          Committed (since this is just for tests now). We can incrementally improve... ideas welcome!
          You can disable by commenting out setupLogging here:

          public abstract class SolrTestCaseJ4 extends LuceneTestCase {
            @BeforeClass
            public static void beforeClassSolrTestCase() throws Exception {
              setupLogging();
          
          Show
          Yonik Seeley added a comment - Committed (since this is just for tests now). We can incrementally improve... ideas welcome! You can disable by commenting out setupLogging here: public abstract class SolrTestCaseJ4 extends LuceneTestCase { @BeforeClass public static void beforeClassSolrTestCase() throws Exception { setupLogging();
          Hide
          Yonik Seeley added a comment -

          Can you send me a patch where you tried to enforce a threadgroup for them?

          Saved by intellij's local history! (I had scrapped it since it wasn't working).

          This represents my last try and is prob more complex than needed, but I was taking shots in the dark (keeping the launching thread around forever in case that made a difference, etc).

          I experimented in SolrLogFormatter.main with thread groups first, and it seemed to work fine. Very disappointing that I couldn't get it to stick with jetty.

          Show
          Yonik Seeley added a comment - Can you send me a patch where you tried to enforce a threadgroup for them? Saved by intellij's local history! (I had scrapped it since it wasn't working). This represents my last try and is prob more complex than needed, but I was taking shots in the dark (keeping the launching thread around forever in case that made a difference, etc). I experimented in SolrLogFormatter.main with thread groups first, and it seemed to work fine. Very disappointing that I couldn't get it to stick with jetty.
          Hide
          Dawid Weiss added a comment -

          I tried setting a ThreadGroup for the jetty instances we spin up, but they didn't stick (request threads were still part of the main thread group). Still, I think this is moving in the right direction.

          Interesting. I've been playing with that randomizedrunner of mine in the background an jetty connection pool threads surely do leak out of tests (and cause their failure), which would suggest they do inherit their parent thread's thread group.

          Can you send me a patch where you tried to enforce a threadgroup for them? I'd like to take a look.

          Show
          Dawid Weiss added a comment - I tried setting a ThreadGroup for the jetty instances we spin up, but they didn't stick (request threads were still part of the main thread group). Still, I think this is moving in the right direction. Interesting. I've been playing with that randomizedrunner of mine in the background an jetty connection pool threads surely do leak out of tests (and cause their failure), which would suggest they do inherit their parent thread's thread group. Can you send me a patch where you tried to enforce a threadgroup for them? I'd like to take a look.
          Hide
          Yonik Seeley added a comment - - edited

          Here's a patch that changes the log format for tests to something that should be easier to debug tests with multiple core containers.

          Example:

          18808 T=64 C3 UPDATE /update {waitSearcher=true&wt=javabin&wt=javabin&commit_end_point=true&commit=true&version=2&version=2} {commit=} 0 60
          18809 T=61 C2 UPDATE /update {waitSearcher=true&wt=javabin&commit=true&version=2} {commit=} 0 94
          18810 T=1 oascc.ZkStateReader.updateCloudState Manual update of cluster state initiated
          18811 T=1 oascc.ZkStateReader.updateCloudState Updating cloud state from ZooKeeper... 
          18855 T=99 C1 REQ /select {distrib=false&wt=javabin&q=*:*&version=2} hits=190 status=0 QTime=35 
          18864 T=100 C3 REQ /select {distrib=false&wt=javabin&q=*:*&version=2} hits=189 status=0 QTime=1 
          18868 T=101 C2 REQ /select {distrib=false&wt=javabin&q=*:*&version=2} hits=180 status=0 QTime=1 
          
          • first number is milliseconds since the start of the test. Makes it easy to sort by the first number or subtract to get elapsed time, and is more relevant to a short running test.
          • T=101 is the thread ID... this will be useful for any multi-threaded test.
          • C2 is the core number. we don't use core name since there can be multiple core containers with the same name (like "") in these distributed tests.
          • package names just use the first letters catenated together for brevity
          • aliases for common log output are implemented (REQ and UPDATE)
          • I got rid of the double logging for updates
          • single line format is easier to read and easier to grep and do things with scripts
          • the first time a core is seen (or whenever it's cloudstate changes) that is output. This allows correlation of C2 with URLs.
            For example, if we want to know about core C2, we search back in the logs for C2_STATE and see
            ASYNC  NEW_CORE C2 name= org.apache.solr.core.SolrCore@22e38fca url=http://127.0.0.1:58141/solr/ node=127.0.0.1:58141_solr C2_STATE=coll:collection1 core: props:{num_shards=3, shard=shard3, state=active, core=, collection=collection1, node_name=127.0.0.1:58141_solr, base_url=http://127.0.0.1:58141/solr}
            

          We can only currently do core / container correlation when SolrRequestInfo is set (i.e. request handler requests). I tried setting a ThreadGroup for the jetty instances we spin up, but they didn't stick (request threads were still part of the main thread group). Still, I think this is moving in the right direction.

          Show
          Yonik Seeley added a comment - - edited Here's a patch that changes the log format for tests to something that should be easier to debug tests with multiple core containers. Example: 18808 T=64 C3 UPDATE /update {waitSearcher= true &wt=javabin&wt=javabin&commit_end_point= true &commit= true &version=2&version=2} {commit=} 0 60 18809 T=61 C2 UPDATE /update {waitSearcher= true &wt=javabin&commit= true &version=2} {commit=} 0 94 18810 T=1 oascc.ZkStateReader.updateCloudState Manual update of cluster state initiated 18811 T=1 oascc.ZkStateReader.updateCloudState Updating cloud state from ZooKeeper... 18855 T=99 C1 REQ /select {distrib= false &wt=javabin&q=*:*&version=2} hits=190 status=0 QTime=35 18864 T=100 C3 REQ /select {distrib= false &wt=javabin&q=*:*&version=2} hits=189 status=0 QTime=1 18868 T=101 C2 REQ /select {distrib= false &wt=javabin&q=*:*&version=2} hits=180 status=0 QTime=1 first number is milliseconds since the start of the test. Makes it easy to sort by the first number or subtract to get elapsed time, and is more relevant to a short running test. T=101 is the thread ID... this will be useful for any multi-threaded test. C2 is the core number. we don't use core name since there can be multiple core containers with the same name (like "") in these distributed tests. package names just use the first letters catenated together for brevity aliases for common log output are implemented (REQ and UPDATE) I got rid of the double logging for updates single line format is easier to read and easier to grep and do things with scripts the first time a core is seen (or whenever it's cloudstate changes) that is output. This allows correlation of C2 with URLs. For example, if we want to know about core C2, we search back in the logs for C2_STATE and see ASYNC NEW_CORE C2 name= org.apache.solr.core.SolrCore@22e38fca url=http: //127.0.0.1:58141/solr/ node=127.0.0.1:58141_solr C2_STATE=coll:collection1 core: props:{num_shards=3, shard=shard3, state=active, core=, collection=collection1, node_name=127.0.0.1:58141_solr, base_url=http://127.0.0.1:58141/solr} We can only currently do core / container correlation when SolrRequestInfo is set (i.e. request handler requests). I tried setting a ThreadGroup for the jetty instances we spin up, but they didn't stick (request threads were still part of the main thread group). Still, I think this is moving in the right direction.
          Hide
          Yonik Seeley added a comment -

          I took a quick shot by creating a custom formatter.
          I tried making the following changes to the testlogging.properties file

          +#java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter
          +java.util.logging.ConsoleHandler.formatter=org.apache.solr.SolrLogFormatter
          

          But the custom formatter is not invoked.

          Show
          Yonik Seeley added a comment - I took a quick shot by creating a custom formatter. I tried making the following changes to the testlogging.properties file +#java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter +java.util.logging.ConsoleHandler.formatter=org.apache.solr.SolrLogFormatter But the custom formatter is not invoked.

            People

            • Assignee:
              Unassigned
              Reporter:
              Yonik Seeley
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:

                Development