Solr
  1. Solr
  2. SOLR-1997

analyzed field: Store internal value instead of input one

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.9, 5.0
    • Component/s: None
    • Labels:
      None

      Description

      Solr implements a set of filters and tokenizers that allow the filtering and treatment of text, but when the field is set to be stored, the text stored is the input one. This is may useful when the end user reads the input, but may not be like this in others, cases, when for example there are payloads and the text is something like A|2.0 good|1.0 day|3.0, or if the result of a query is processed using something like Carrot2

      So this is a simple new kind of field that takes as input the output of a given type (source), and then performs the normal processing with the desired tokenizers and filters . The difference is that the stored value is the output of the source type, and this is what is retrieved when getting the document.

      The name of the field type is AnalyzedField and in the schema is introduced in the following way to create the analyzedSourceType from the SourceType
      <fieldType name="SourceType" class="solr.TextField" >
      <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory" />
      <filter class......." />
      </analyzer>
      <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory" />
      <filter ....." />
      </analyzer>
      </fieldType>

      <fieldType name="analyzedSoureType" class="solr.AnalyzedField" positionIncrementGap="100" preProcessType="SourceType">
      <analyzer>
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
      </fieldType>

      many times just the WhitespaceTokenizerFactory is needed as the tokens have already been cut down by the SourceType

      finally, a field can be declared as
      <field name="analyzedData" type="analyzedSoureType" indexed="true" stored="true" termVectors="true" multiValued="true"/>

      which can be written directly or can be defined as a copy of the source one.

      <field name="Data" type="analyzedSoureType" indexed="true" stored="true" termVectors="true" multiValued="true"/>
      ...
      <copyField source=data" dest="analyzedData"/>

      1. SOLR-1997-1.5.patch
        11 kB
        Joan Codina
      2. SOLR-1997-1.4.patch
        5 kB
        Joan Codina

        Activity

        Uwe Schindler made changes -
        Fix Version/s 4.9 [ 12326731 ]
        Fix Version/s 5.0 [ 12321664 ]
        Fix Version/s 4.8 [ 12326254 ]
        Hide
        Uwe Schindler added a comment -

        Move issue to Solr 4.9.

        Show
        Uwe Schindler added a comment - Move issue to Solr 4.9.
        David Smiley made changes -
        Fix Version/s 4.8 [ 12326254 ]
        Fix Version/s 4.7 [ 12325573 ]
        Uwe Schindler made changes -
        Fix Version/s 4.7 [ 12325573 ]
        Fix Version/s 4.6 [ 12325000 ]
        Adrien Grand made changes -
        Fix Version/s 4.6 [ 12325000 ]
        Fix Version/s 5.0 [ 12321664 ]
        Fix Version/s 4.5 [ 12324743 ]
        Steve Rowe made changes -
        Fix Version/s 5.0 [ 12321664 ]
        Fix Version/s 4.5 [ 12324743 ]
        Fix Version/s 4.4 [ 12324324 ]
        Hide
        Steve Rowe added a comment -

        Bulk move 4.4 issues to 4.5 and 5.0

        Show
        Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
        Uwe Schindler made changes -
        Fix Version/s 4.4 [ 12324324 ]
        Fix Version/s 4.3 [ 12324128 ]
        Robert Muir made changes -
        Fix Version/s 4.3 [ 12324128 ]
        Fix Version/s 5.0 [ 12321664 ]
        Fix Version/s 4.2 [ 12323893 ]
        Mark Miller made changes -
        Fix Version/s 4.2 [ 12323893 ]
        Fix Version/s 5.0 [ 12321664 ]
        Fix Version/s 4.1 [ 12321141 ]
        Robert Muir made changes -
        Fix Version/s 4.1 [ 12321141 ]
        Fix Version/s 4.0 [ 12314992 ]
        Hoss Man made changes -
        Fix Version/s 3.6 [ 12319065 ]
        Hide
        Hoss Man added a comment -

        Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently.

        email notification suppressed to prevent mass-spam
        psuedo-unique token identifying these issues: hoss20120321nofix36

        Show
        Hoss Man added a comment - Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently. email notification suppressed to prevent mass-spam psuedo-unique token identifying these issues: hoss20120321nofix36
        Simon Willnauer made changes -
        Fix Version/s 3.6 [ 12319065 ]
        Fix Version/s 3.5 [ 12317876 ]
        Robert Muir made changes -
        Fix Version/s 3.5 [ 12317876 ]
        Fix Version/s 3.4 [ 12316683 ]
        Hide
        Robert Muir added a comment -

        3.4 -> 3.5

        Show
        Robert Muir added a comment - 3.4 -> 3.5
        Robert Muir made changes -
        Fix Version/s 3.4 [ 12316683 ]
        Fix Version/s 4.0 [ 12314992 ]
        Fix Version/s 3.3 [ 12316471 ]
        Robert Muir made changes -
        Fix Version/s 3.3 [ 12316471 ]
        Fix Version/s 3.2 [ 12316172 ]
        Hide
        Robert Muir added a comment -

        Bulk move 3.2 -> 3.3

        Show
        Robert Muir added a comment - Bulk move 3.2 -> 3.3
        Hoss Man made changes -
        Fix Version/s 3.2 [ 12316172 ]
        Fix Version/s Next [ 12315093 ]
        Yonik Seeley made changes -
        Fix Version/s Next [ 12315093 ]
        Fix Version/s 1.4 [ 12313351 ]
        Fix Version/s 1.5 [ 12313566 ]
        Fix Version/s 1.4.1 [ 12315096 ]
        Affects Version/s 1.4 [ 12313351 ]
        Affects Version/s 1.5 [ 12313566 ]
        Affects Version/s 1.4.1 [ 12315096 ]
        Hide
        Joan Codina added a comment -

        With respect to Solr-1535 from what I understand, it allows to load data externally generated in a given format that is not processed by Solr but indexed as desired. This is slightly different as we do process it with solr but store it
        after processing not before (as usually Solr does)

        With Solr-314 I think the idea here is much simpler: To store something different that is in the input, but using always the Solr existing analyzers

        the idea is that the ouptut of one analyzer is used as the input of a field. As the field stores the input as is, the output of the analyzer is stored.
        Why? well, for many reasons.: for example it text includes Payloads, we don't want to show them. Or if we remove some labels...
        We can decide to do half of the processing with the previous analyzer and then do some extra processing in the field. But in this way we can control what we store and what we index.
        I think that are a few lines of code that add functionality to the schema, so once integrated users don't need to program.

        Show
        Joan Codina added a comment - With respect to Solr-1535 from what I understand, it allows to load data externally generated in a given format that is not processed by Solr but indexed as desired. This is slightly different as we do process it with solr but store it after processing not before (as usually Solr does) With Solr-314 I think the idea here is much simpler: To store something different that is in the input, but using always the Solr existing analyzers the idea is that the ouptut of one analyzer is used as the input of a field. As the field stores the input as is, the output of the analyzer is stored. Why? well, for many reasons.: for example it text includes Payloads, we don't want to show them. Or if we remove some labels... We can decide to do half of the processing with the previous analyzer and then do some extra processing in the field. But in this way we can control what we store and what we index. I think that are a few lines of code that add functionality to the schema, so once integrated users don't need to program.
        Hide
        Lance Norskog added a comment -

        This overlaps somewhat with SOLR-1535.

        Show
        Lance Norskog added a comment - This overlaps somewhat with SOLR-1535 .
        Hide
        Ryan McKinley added a comment -

        Check an old old issue SOLR-314... that did the same thing

        I'm still torn if this is a good idea or not...

        Show
        Ryan McKinley added a comment - Check an old old issue SOLR-314 ... that did the same thing I'm still torn if this is a good idea or not...
        Joan Codina made changes -
        Attachment patch_solr_1.5_1997_.txt [ 12449249 ]
        Joan Codina made changes -
        Attachment SOLR-1997-1.4.patch [ 12449250 ]
        Attachment SOLR-1997-1.5.patch [ 12449251 ]
        Hide
        Joan Codina added a comment -

        patch for 1.4 and 1.5 versions

        Show
        Joan Codina added a comment - patch for 1.4 and 1.5 versions
        Joan Codina made changes -
        Field Original Value New Value
        Attachment patch_solr_1.5_1997_.txt [ 12449249 ]
        Hide
        Joan Codina added a comment -

        Patch for Solr 1.5

        Show
        Joan Codina added a comment - Patch for Solr 1.5
        Joan Codina created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Joan Codina
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development