Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1676

New Token filter for adding payloads "in-stream"

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 2.9
    • modules/analysis
    • None

    Description

      This TokenFilter is able to split a token based on a delimiter and use one part as the token and the other part as a payload. This allows someone to include payloads inline with tokens (presumably setup by a pipeline ahead of time). An example is apropos. Given a | delimiter, we could have a stream that looks like:

      The quick|JJ red|JJ fox|NN jumped|VB over the lazy|JJ brown|JJ dogs|NN

      In this case, this would produce tokens and payloads (assuming whitespace tokenization):
      Token: the
      Payload: null

      Token: quick
      Payload: JJ

      Token: red
      Pay: JJ.

      and so on.

      This patch will also support pluggable encoders for the payloads, so it can convert from the character array to byte arrays as appropriate.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            gsingers Grant Ingersoll
            gsingers Grant Ingersoll
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment