Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-11916

new SortableTextField using docValues built from the original string input

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.3, 8.0
    • Component/s: Schema and Analysis
    • Labels:
      None

      Description

      I propose adding a new SortableTextField subclass that would functionally work the same as TextField except:

      • docValues="true|false" could be configured, with the default being "true"
      • The docValues would contain the original input values (just like StrField) for sorting (or faceting)
        • By default, to protect users from excessively large docValues, only the first 1024 of each field value would be used – but this could be overridden with configuration.

      Consider the following sample configuration:

      <field name="title" type="text_sortable" docValues="true"
             indexed="true" docValues="true" stored="true" multiValued="false"/>
      <fieldType name="text_sortable" class="solr.SortableTextField">
        <analyzer type="index">
         ...
        </analyzer>
        <analyzer type="query">
         ...
        </analyzer>
      </fieldType>
      

      Given a document with a title of "Solr In Action"

      Users could:

      • Search for individual (indexed) terms in the "title" field: q=title:solr
      • Sort documents by title ( sort=title asc ) such that this document's sort value would be "Solr In Action"

      If another document had a "title" value that was longer then 1024 chars, then the docValues would be built using only the first 1024 characters of the value (unless the user modified the configuration)

      This would be functionally equivalent to the following existing configuration - including the on disk index segments - except that the on disk DocValues would refer directly to the "title" field, reducing the total number of "field infos" in the index (which has a small impact on segment housekeeping and merge times) and end users would not need to sort on an alternate "title_string" field name - the original "title" field name would always be used directly.

      <field name="title" type="text"
             indexed="true" docValues="true" stored="true" multiValued="false"/>
      <field name="title_string" type="string"
             indexed="false" docValues="true" stored="false" multiValued="false"/>
      <copyField source="title" dest="title_string" maxCharsForDocValues="1024" />
      

        Attachments

        1. SOLR-11916.patch
          60 kB
          Hoss Man
        2. SOLR-11916.patch
          52 kB
          Hoss Man

          Issue Links

            Activity

              People

              • Assignee:
                hossman Hoss Man
                Reporter:
                hossman Hoss Man
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: