Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-15783

MDC Values can leak between request threads, trace_id currently does

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 9.0
    • None
    • None

    Description

      Currently, the only general "MDC Cleanup" code that happens from request to request is done by MDCLoggingContext.clear() and MDCLoggingContext.reset() ... but these methods (in addition to only being concerned with a static list of 5 specific MDC values) are currently designed and used in such a way that things like "trace_id" are only "reset" at the begining of HttpSolrCall.call() ... which means any request specific log messages that happen before that point (like authn errors) will incorrectly refer to the trace_id of whatever request the thread previously processed.

      As a trivial example of this, if you apply a patch like...

      diff --git a/solr/core/src/java/org/apache/solr/client/solrj/embedded/JettySolrRunner.java b/solr/core/src/java/org/apache/solr/client/solrj/embedded/JettySolrRunner.java
      index 5f481c48e92..97c864617f4 100644
      --- a/solr/core/src/java/org/apache/solr/client/solrj/embedded/JettySolrRunner.java
      +++ b/solr/core/src/java/org/apache/solr/client/solrj/embedded/JettySolrRunner.java
      @@ -109,7 +109,7 @@ public class JettySolrRunner {
       
         private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
       
      -  private static final int THREAD_POOL_MAX_THREADS = 10000;
      +  private static final int THREAD_POOL_MAX_THREADS = 10; // nocommit
         // NOTE: needs to be larger than SolrHttpClient.threadPoolSweeperMaxIdleTime
         private static final int THREAD_POOL_MAX_IDLE_TIME_MS = 260000;
       
      diff --git a/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java b/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java
      index 28167270099..fa6bb9a4eb8 100644
      --- a/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java
      +++ b/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java
      @@ -484,6 +484,9 @@ public class SolrDispatchFilter extends BaseSolrFilter {
               response.sendError(429, errorMessage);
             }
       
      +      log.error("nocommit: fake log message before HttpSolrCall.call() as if auth error happened: MDC_claims={} ; real_traceId={}",
      +                org.slf4j.MDC.get(org.apache.solr.logging.MDCLoggingContext.TRACE_ID), span.context().toTraceId());
      +      
             AtomicReference<HttpServletRequest> wrappedRequest = new AtomicReference<>();
             if (!authenticateRequest(request, response, wrappedRequest)) { // the response and status code have already been sent
               return;
      

      ...and then run ./gradlew test --tests TestDistributedTracing -Ptests.verbose=true you can pick out any single jetty query thread and see something like...

        2> 4103 ERROR (qtp1310066330-70) [] o.a.s.s.SolrDispatchFilter nocommit: fake log message before HttpSolrCall.call() as if auth error happened: MDC_claims=null ; real_traceId=1
        2> 6146 ERROR (qtp1310066330-70) [n:127.0.0.1:34135_solr t:1] o.a.s.s.SolrDispatchFilter nocommit: fake log message before HttpSolrCall.call() as if auth error happened: MDC_claims=1 ; real_traceId=22
        2> 6661 ERROR (qtp1310066330-70) [n:127.0.0.1:34135_solr t:22] o.a.s.s.SolrDispatchFilter nocommit: fake log message before HttpSolrCall.call() as if auth error happened: MDC_claims=22 ; real_traceId=50
      

      (Note also that during authn, the nodeId isn't populated for a 'new' thread, it's only set if it was leaked from the last request processed)

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            hossman Chris M. Hostetter
            hossman Chris M. Hostetter
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment