Lucene - Core
  1. Lucene - Core
  2. LUCENE-1372

Proposal: introduce more sensible sorting when a doc has multiple values for a term

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Affects Version/s: 2.3.2
    • Fix Version/s: None
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      At the moment, FieldCacheImpl has somewhat disconcerting values when sorting on a field for which multiple values exist for one document. For example, imagine a field "fruit" which is added to a document multiple times, with the values as follows:

      doc 1:

      {"apple"}

      doc 2:

      {"banana"}

      doc 3:

      {"apple", "banana"}

      doc 4:

      {"apple", "zebra"}

      if one sorts on the field "fruit", the loop in FieldCacheImpl.stringsIndexCache.createValue() (and similarly for the other methods in the various FieldCacheImpl caches) does the following:

      while (termDocs.next())

      { retArray[termDocs.doc()] = t; }

      which means that we look over the terms in their natural order and, on each one, overwrite retArray[doc] with the value for each document with that term. Effectively, this overwriting means that a string sort in this circumstance will sort by the LAST term lexicographically, so the docs above will effecitvely be sorted as if they had the single values ("apple", "banana", "banana", "zebra") which is nonintuitive. To change this to sort on the first time in the TermEnum seems relatively trivial and low-overhead; while it's not perfect (it's not local-aware, for example) the behaviour seems much more sensible to me. Interested to see what people think.

      Patch to follow.

      1. LUCENE-1372-MultiValueSorters.patch
        18 kB
        Paul Cowan
      2. lucene-multisort.patch
        4 kB
        Paul Cowan

        Issue Links

          Activity

          Erick Erickson made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Won't Fix [ 2 ]
          Mark Thomas made changes -
          Workflow Default workflow, editable Closed status [ 12563156 ] jira [ 12584113 ]
          Mark Thomas made changes -
          Workflow jira [ 12441171 ] Default workflow, editable Closed status [ 12563156 ]
          Uwe Schindler made changes -
          Link This issue is related to SOLR-940 [ SOLR-940 ]
          Mark Miller made changes -
          Link This issue is related to LUCENE-831 [ LUCENE-831 ]
          Paul Cowan made changes -
          Attachment LUCENE-1372-MultiValueSorters.patch [ 12401370 ]
          Uwe Schindler made changes -
          Link This issue is related to LUCENE-1470 [ LUCENE-1470 ]
          Uwe Schindler made changes -
          Link This issue is related to SOLR-940 [ SOLR-940 ]
          Paul Cowan made changes -
          Field Original Value New Value
          Attachment lucene-multisort.patch [ 12389264 ]
          Paul Cowan created issue -

            People

            • Assignee:
              Unassigned
              Reporter:
              Paul Cowan
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development