Solr
  1. Solr
  2. SOLR-2893

NPE in JSONWriter during Distributed Search w/Grouping

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: None
    • Component/s: search
    • Labels:
      None
    • Environment:

      java version "1.6.0_18"
      OpenJDK Runtime Environment (IcedTea6 1.8) (fedora-41.b18.fc13-x86_64)
      OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)

      Description

      I am doing some testing with a nightly build from 11/4/2011 and I regularly encounter an NPE in the JSONWriter when performing a distributed search with grouping on a single numeric field. The same searches non-distributed do not encounter this issue.

      Here is the query being used: facet=off&shards=searcher1-vip:8983/solr/core_0,searcher2-vip:8983/solr/core_1,localhost:8983/solr/core_2&start=0&shards.qt=/item/search&q=nova+y+jory&wt=json&fq=country:ww+OR+country:us&fq=stream:1&rows=4

      Here is the handler configuration:

      <requestHandler name="/item/search" class="solr.SearchHandler">
      <lst name="invariants">
      <str name="echoParams">explicit</str>
      <str name="json.nl">map</str>

      <!-- Query settings -->
      <str name="defType">dismax</str>
      <str name="qf">title name^2.0</str>
      <str name="pf2">title name^2.0</str>
      <int name="ps2">2</int>
      <str name="bf">product(apply_bf,product(sum(premium,1),sum(ln(max(plays,1)),ln(max(sum(count,trend),1)))))^25</str>

      <!-- Faceting defaults -->
      <str name="facet.field">genre</str>
      <int name="f.genre.facet.mincount">1</int>
      <str name="f.genre.facet.method">enum</str>

      <!-- grouping -->
      <str name="group.field">id</str>
      <str name="group.ngroups">true</str>
      <int name="group.cache.percent">25</int>
      </lst>
      <lst name="defaults">
      <str name="wt">json</str>
      <str name="rows">10</str>
      <str name="facet">off</str>
      <str name="group">true</str>
      <str name="fl">contributor_id,album_id,song_id</str>
      <str name="bf">product(apply_bf,sum(ln(max(plays,1)),ln(max(sum(count,trend),1))))^25</str>
      </lst>
      <lst name="appends">
      <!-- query filter for rights date -->
      <str name="fq">start:[* TO NOW/DAY+1DAY] AND end:[NOW/DAY TO *]</str>
      </lst>
      </requestHandler>

      None of the individual shards log any sort of error, however the server that is distributing the search logs the following stack trace fairly regular and returns a 500 error:

      Nov 11, 2011 12:30:01 PM org.apache.solr.common.SolrException log
      SEVERE: java.lang.NullPointerException
      at org.apache.solr.response.JSONWriter.writeSolrDocument(JSONResponseWriter.java:320)
      at org.apache.solr.response.TextResponseWriter.writeSolrDocumentList(TextResponseWriter.java:201)
      at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:169)
      at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:174)
      at org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:290)
      at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:173)
      at org.apache.solr.response.JSONWriter.writeArray(JSONResponseWriter.java:522)
      at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:175)
      at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:174)
      at org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:290)
      at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:173)
      at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:174)
      at org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:294)
      at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:173)
      at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:174)
      at org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:290)
      at org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:87)
      at org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:51)
      at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:340)
      at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261)
      at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
      at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
      at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
      at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
      at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
      at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
      at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
      at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
      at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
      at org.mortbay.jetty.Server.handle(Server.java:326)
      at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
      at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
      at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
      at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
      at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
      at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
      at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

        Activity

        Hide
        Martijn van Groningen added a comment -

        Does the error only occurs when wt=json in distributed mode? So when xml is used as wt param value the error doesn't occur?
        Do all your documents have a value for the group.field (id field)?
        How often do you commit?

        Show
        Martijn van Groningen added a comment - Does the error only occurs when wt=json in distributed mode? So when xml is used as wt param value the error doesn't occur? Do all your documents have a value for the group.field (id field)? How often do you commit?
        Hide
        Michael Garski added a comment -

        I've not tested with XML, but will do so this week and report my findings. I believe all of the documents have a value for the group field and will confirm that as well. Data is committed on the indexer every 5 minutes, and is replicated to the searcher every 15.

        This could be a coincidence, but I have observed that by disabling the document cache the error rate increases, and then re-enabling it and reloading the core the error rate drops. I have also found that if I trigger a full GC on a shard so it becomes unresponsive for a brief time the searcher that is coordinating the distributed search logs the error in communicating with the shard then the rate of NPEs increases dramatically for a time before recovering.

        Show
        Michael Garski added a comment - I've not tested with XML, but will do so this week and report my findings. I believe all of the documents have a value for the group field and will confirm that as well. Data is committed on the indexer every 5 minutes, and is replicated to the searcher every 15. This could be a coincidence, but I have observed that by disabling the document cache the error rate increases, and then re-enabling it and reloading the core the error rate drops. I have also found that if I trigger a full GC on a shard so it becomes unresponsive for a brief time the searcher that is coordinating the distributed search logs the error in communicating with the shard then the rate of NPEs increases dramatically for a time before recovering.
        Hide
        Michael Garski added a comment -

        I've completed testing that revealed the root cause - out of sync shards on different servers. I think this issue can be closed as it may not be related to grouping and is actually mentioned in the distributed search limitations - "The index could change between stages, e.g. a document that matched a query and was subsequently changed may no longer match but will still be retrieved".

        Show
        Michael Garski added a comment - I've completed testing that revealed the root cause - out of sync shards on different servers. I think this issue can be closed as it may not be related to grouping and is actually mentioned in the distributed search limitations - "The index could change between stages, e.g. a document that matched a query and was subsequently changed may no longer match but will still be retrieved".
        Hide
        Yonik Seeley added a comment -

        I've completed testing that revealed the root cause - out of sync shards on different servers.

        Thanks for tracking that down Michael. As this case is both expected and non-catastrophic, we should definitely fix things to handle it gracefully.

        Show
        Yonik Seeley added a comment - I've completed testing that revealed the root cause - out of sync shards on different servers. Thanks for tracking that down Michael. As this case is both expected and non-catastrophic, we should definitely fix things to handle it gracefully.

          People

          • Assignee:
            Unassigned
            Reporter:
            Michael Garski
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development