[SOLR-1020] PreAnalyzed field analyzer - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Minor
Resolution: Won't Fix
Affects Version/s: 1.3
Fix Version/s: None
Component/s: Schema and Analysis
Labels:
None

Description

An Analyzer that produce a TokenStream based on XML input that contains a marshalled TokenStream. Also contains static TokenStream XML marshaller.

I kind of pulled this out of my pocket without testing it in a real environment in order to get some comments on the solution before I add it to my project. So cosider it a beta-patch.

It use JSR173 XMLStream API available in Java 1.6, compatible with Java 1.5 and downloadable from https://sjsxp.dev.java.net/

XSD:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified"
           xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="tokens" type="tokensType"/>
    <xs:complexType name="tokensType">
        <xs:sequence>
            <xs:element type="tokenType" name="token"/>
        </xs:sequence>
    </xs:complexType>
    <xs:complexType name="tokenType">
        <xs:sequence>
            <xs:element type="xs:int" name="positionIncrement" maxOccurs="1"/>
            <xs:element type="xs:string" name="term" minOccurs="1" maxOccurs="1"/>
            <xs:element type="xs:string" name="type" maxOccurs="1"/>
            <xs:element type="xs:int" name="startOffset" maxOccurs="1"/>
            <xs:element type="xs:int" name="endOffset" maxOccurs="1"/>
            <xs:element type="xs:int" name="flags" maxOccurs="1"/>
            <xs:element type="payloadType" name="payload" maxOccurs="1"/>
        </xs:sequence>
    </xs:complexType>
    <xs:complexType name="payloadType">
        <xs:choice maxOccurs="1" minOccurs="1">
            <xs:element type="bytesType" name="bytes"/>
            <xs:element type="xs:string" name="hex"/>
            <xs:element type="xs:string" name="base64"/>
        </xs:choice>
    </xs:complexType>
    <xs:complexType name="bytesType">
        <xs:sequence>
            <xs:element type="xs:byte" name="byte" maxOccurs="unbounded" minOccurs="1"/>
        </xs:sequence>
    </xs:complexType>
</xs:schema>

Even though I've added a couple of variants to how to handle a Payload in the XSD only <hex> is supported.

Example XML:

<tokens>
  <token>
    <positionIncrement>1</positionIncrement>
    <term>term</term>
    <type>type</type>
    <startOffset>0</startOffset>
    <endOffset>3</endOffset>
    <flags>65535</flags>
    <payload><hex>fffefd</hex></payload>
  </token>
</tokens>

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-1020.txt
14/Feb/09 20:05
17 kB
Karl Wettin

Activity

People

Assignee:: Unassigned

Reporter:: Karl Wettin

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 14/Feb/09 20:03

Updated:: 16/Mar/13 19:00

Resolved:: 16/Mar/13 18:57