Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-1020

PreAnalyzed field analyzer

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Won't Fix
    • 1.3
    • None
    • Schema and Analysis
    • None

    Description

      An Analyzer that produce a TokenStream based on XML input that contains a marshalled TokenStream. Also contains static TokenStream XML marshaller.

      I kind of pulled this out of my pocket without testing it in a real environment in order to get some comments on the solution before I add it to my project. So cosider it a beta-patch.

      It use JSR173 XMLStream API available in Java 1.6, compatible with Java 1.5 and downloadable from https://sjsxp.dev.java.net/

      XSD:

      <?xml version="1.0" encoding="UTF-8"?>
      <xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified"
                 xmlns:xs="http://www.w3.org/2001/XMLSchema">
          <xs:element name="tokens" type="tokensType"/>
          <xs:complexType name="tokensType">
              <xs:sequence>
                  <xs:element type="tokenType" name="token"/>
              </xs:sequence>
          </xs:complexType>
          <xs:complexType name="tokenType">
              <xs:sequence>
                  <xs:element type="xs:int" name="positionIncrement" maxOccurs="1"/>
                  <xs:element type="xs:string" name="term" minOccurs="1" maxOccurs="1"/>
                  <xs:element type="xs:string" name="type" maxOccurs="1"/>
                  <xs:element type="xs:int" name="startOffset" maxOccurs="1"/>
                  <xs:element type="xs:int" name="endOffset" maxOccurs="1"/>
                  <xs:element type="xs:int" name="flags" maxOccurs="1"/>
                  <xs:element type="payloadType" name="payload" maxOccurs="1"/>
              </xs:sequence>
          </xs:complexType>
          <xs:complexType name="payloadType">
              <xs:choice maxOccurs="1" minOccurs="1">
                  <xs:element type="bytesType" name="bytes"/>
                  <xs:element type="xs:string" name="hex"/>
                  <xs:element type="xs:string" name="base64"/>
              </xs:choice>
          </xs:complexType>
          <xs:complexType name="bytesType">
              <xs:sequence>
                  <xs:element type="xs:byte" name="byte" maxOccurs="unbounded" minOccurs="1"/>
              </xs:sequence>
          </xs:complexType>
      </xs:schema>
      

      Even though I've added a couple of variants to how to handle a Payload in the XSD only <hex> is supported.

      Example XML:

      <tokens>
        <token>
          <positionIncrement>1</positionIncrement>
          <term>term</term>
          <type>type</type>
          <startOffset>0</startOffset>
          <endOffset>3</endOffset>
          <flags>65535</flags>
          <payload><hex>fffefd</hex></payload>
        </token>
      </tokens>
      

      Attachments

        1. SOLR-1020.txt
          17 kB
          Karl Wettin

        Activity

          People

            Unassigned Unassigned
            karl.wettin Karl Wettin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: