Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-1690

JSONKeyValueTokenizerFactory -- JSON Tokenizer

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • Schema and Analysis
    • None

    Description

      Sometimes it is nice to group structured data into a single field.

      This (rough) patch, takes JSON input and indexes tokens based on the key values pairs in the json.

      schema.xml
      <!-- JSON Field Type -->
          <fieldtype name="json" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
            <analyzer type="index">
              <tokenizer class="solr.JSONKeyValueTokenizerFactory" keepArray="true" hierarchicalKey="false"/>
              <filter class="solr.TrimFilterFactory"/>
              <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
            <analyzer type="query">
              <tokenizer class="solr.KeywordTokenizerFactory"/>
              <filter class="solr.TrimFilterFactory" />
              <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
          </fieldtype>
      

      Given text:

       { "hello": "world", "rank":5 }
      

      indexed as two tokens:

      term position 1 2
      term text hello:world rank:5
      term type word word
      source start,end 12,17 27,28

      Attachments

        1. noggit-1.0-A1.jar
          21 kB
          Ryan McKinley
        2. SOLR-1690-JSONKeyValueTokenizerFactory.patch
          7 kB
          Ryan McKinley

        Activity

          People

            Unassigned Unassigned
            ryantxu Ryan McKinley
            Votes:
            2 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: