Solr
  1. Solr
  2. SOLR-1690

JSONKeyValueTokenizerFactory -- JSON Tokenizer

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Schema and Analysis
    • Labels:
      None

      Description

      Sometimes it is nice to group structured data into a single field.

      This (rough) patch, takes JSON input and indexes tokens based on the key values pairs in the json.

      schema.xml
      <!-- JSON Field Type -->
          <fieldtype name="json" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
            <analyzer type="index">
              <tokenizer class="solr.JSONKeyValueTokenizerFactory" keepArray="true" hierarchicalKey="false"/>
              <filter class="solr.TrimFilterFactory"/>
              <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
            <analyzer type="query">
              <tokenizer class="solr.KeywordTokenizerFactory"/>
              <filter class="solr.TrimFilterFactory" />
              <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
          </fieldtype>
      

      Given text:

       { "hello": "world", "rank":5 }
      

      indexed as two tokens:

      term position 1 2
      term text hello:world rank:5
      term type word word
      source start,end 12,17 27,28
      1. noggit-1.0-A1.jar
        21 kB
        Ryan McKinley
      2. SOLR-1690-JSONKeyValueTokenizerFactory.patch
        7 kB
        Ryan McKinley

        Activity

        Ryan McKinley created issue -
        Ryan McKinley made changes -
        Field Original Value New Value
        Attachment SOLR-1690-JSONKeyValueTokenizerFactory.patch [ 12429152 ]
        Ryan McKinley made changes -
        Attachment noggit-1.0-A1.jar [ 12429153 ]
        Ryan McKinley made changes -
        Description Sometimes it is nice to group structured data into a single field.

        This (rough) patch, takes JSON input and indexes tokens based on the key values pairs in the json.

        For example, the text:
        {code}
         { "hello": "world", "rank":5 }
        {code}
        gets indexed as two tokens:

        || term position | 1 | 2 |
        || term text | hello:world | rank:5 |
        || term type | word | word |
        || source start,end | 12,17 | 27,28 |
        Sometimes it is nice to group structured data into a single field.

        This (rough) patch, takes JSON input and indexes tokens based on the key values pairs in the json.

        {code:xml|title=schema.xml}
        <!-- JSON Field Type -->
            <fieldtype name="json" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
              <analyzer type="index">
                <tokenizer class="solr.JSONKeyValueTokenizerFactory" keepArray="true" hierarchicalKey="false"/>
                <filter class="solr.TrimFilterFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
              </analyzer>
              <analyzer type="query">
                <tokenizer class="solr.KeywordTokenizerFactory"/>
                <filter class="solr.TrimFilterFactory" />
                <filter class="solr.LowerCaseFilterFactory"/>
              </analyzer>
            </fieldtype>
        {code}

        Given text:
        {code}
         { "hello": "world", "rank":5 }
        {code}

        indexed as two tokens:

        || term position | 1 | 2 |
        || term text | hello:world | rank:5 |
        || term type | word | word |
        || source start,end | 12,17 | 27,28 |

          People

          • Assignee:
            Unassigned
            Reporter:
            Ryan McKinley
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development