Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-7867

implicit sharded, facet grouping problem with multivalued string field starting with digits

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 5.2
    • Fix Version/s: None
    • Component/s: faceting, SolrCloud
    • Environment:

      3.13.0-48-generic #80-Ubuntu SMP x86_64 GNU/Linux
      java version "1.7.0_80"
      Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
      Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

      Description

      related parts @ schema.xml:

      <field name="keyword_ss" type="string" indexed="true" stored="true" docValues="true" multiValued="true"/>
      <field name="author_s" type="string" indexed="true" stored="true" docValues="true"/>

      every document has valid author_s and keyword_ss fields;

      we can make successful facet group queries on single node, single collection, solr-4.9.0 server

      q: *:* fq: keyword_ss:3m
      facet=true&facet.field=keyword_ss&group=true&group.field=author_s&group.facet=true
      

      when querying on solr-5.2.0 server with implicit sharded environment with:

      <!-- router.field -->
      <field name="shard_name" type="string" indexed="true" stored="true" required="true"/>

      with example shard names; affinity1 affinity2 affinity3 affinity4

      the same query with same documents gets:

      ERROR - 2015-08-04 08:15:15.222; [document affinity3 core_node32 document_affinity3_replica2] org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Exception during facet.field: keyword_ss
              at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:632)
              at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:617)
              at java.util.concurrent.FutureTask.run(FutureTask.java:262)
              at org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:571)
              at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:642)
      ...
              at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
              at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.ArrayIndexOutOfBoundsException
              at org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer$CompressedBinaryDocValues$CompressedBinaryTermsEnum.readTerm(Lucene50DocValuesProducer.java:1008)
              at org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer$CompressedBinaryDocValues$CompressedBinaryTermsEnum.next(Lucene50DocValuesProducer.java:1026)
              at org.apache.lucene.search.grouping.term.TermGroupFacetCollector$MV$SegmentResult.nextTerm(TermGroupFacetCollector.java:373)
              at org.apache.lucene.search.grouping.AbstractGroupFacetCollector.mergeSegmentResults(AbstractGroupFacetCollector.java:91)
              at org.apache.solr.request.SimpleFacets.getGroupedCounts(SimpleFacets.java:541)
              at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:463)
              at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:386)
              at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:626)
              ... 33 more
      

      all the problematic queries are caused by strings starting with digits; ("3m", "8 saniye", "2 broke girls", "1v1y")
      there are some strings that the query works like ("24", "90+", "45 dakika")

      we do not observe the problem when querying with
      -keyword_ss:(0-9)*

      updating the problematic documents (a small subset of keyword_ss:(0-9)*), fixes the query,
      but we cannot find an easy solution to find the problematic documents
      there is around 400m docs; seperated at 28 shards;
      -keyword_ss:(0-9)* matches %97 of documents

        Attachments

        1. ErrorReadingDocValues.PNG
          112 kB
          Jonathan Gonzalez
        2. DocValuesException.PNG
          39 kB
          Jonathan Gonzalez

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              umuero Umut Erogul
            • Votes:
              5 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated: