Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-1997

analyzed field: Store internal value instead of input one

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • 4.9, 6.0
    • None
    • None

    Description

      Solr implements a set of filters and tokenizers that allow the filtering and treatment of text, but when the field is set to be stored, the text stored is the input one. This is may useful when the end user reads the input, but may not be like this in others, cases, when for example there are payloads and the text is something like A|2.0 good|1.0 day|3.0, or if the result of a query is processed using something like Carrot2

      So this is a simple new kind of field that takes as input the output of a given type (source), and then performs the normal processing with the desired tokenizers and filters . The difference is that the stored value is the output of the source type, and this is what is retrieved when getting the document.

      The name of the field type is AnalyzedField and in the schema is introduced in the following way to create the analyzedSourceType from the SourceType
      <fieldType name="SourceType" class="solr.TextField" >
      <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory" />
      <filter class......." />
      </analyzer>
      <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory" />
      <filter ....." />
      </analyzer>
      </fieldType>

      <fieldType name="analyzedSoureType" class="solr.AnalyzedField" positionIncrementGap="100" preProcessType="SourceType">
      <analyzer>
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
      </fieldType>

      many times just the WhitespaceTokenizerFactory is needed as the tokens have already been cut down by the SourceType

      finally, a field can be declared as
      <field name="analyzedData" type="analyzedSoureType" indexed="true" stored="true" termVectors="true" multiValued="true"/>

      which can be written directly or can be defined as a copy of the source one.

      <field name="Data" type="analyzedSoureType" indexed="true" stored="true" termVectors="true" multiValued="true"/>
      ...
      <copyField source=data" dest="analyzedData"/>

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            jcodina Joan Codina

            Dates

              Created:
              Updated:

              Slack

                Issue deployment