Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-10351

Add analyze Stream Evaluator to support streaming NLP

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Resolved
    • Affects Version/s: None
    • Fix Version/s: 6.6, 7.0
    • Component/s: None
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:

      Description

      The analyze Stream Evaluator uses a Solr analyzer to return a collection of tokens from a text field. The collection of tokens can then be streamed out by the cartesianProduct Streaming Expression or attached to documents as multi-valued fields by the select Streaming Expression.

      This allows Streaming Expressions to leverage all the existing tokenizers and filters and provides a place for future NLP analyzers to be added to Streaming Expressions.

      Sample syntax:

      cartesianProduct(expr, analyze(analyzerField, textField) as outfield )
      
      select(expr, analyze(analyzerField, textField) as outfield )
      

      Combined with Solr's batch text processing capabilities this provides an entire parallel NLP framework. Solr's batch processing capabilities are described here:

      Batch jobs, Parallel ETL and Streaming Text Transformation
      http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html

        Attachments

        1. SOLR-10351.patch
          9 kB
          Joel Bernstein
        2. SOLR-10351.patch
          18 kB
          Joel Bernstein
        3. SOLR-10351.patch
          21 kB
          Joel Bernstein
        4. SOLR-10351.patch
          21 kB
          Joel Bernstein

          Activity

            People

            • Assignee:
              joel.bernstein Joel Bernstein
              Reporter:
              joel.bernstein Joel Bernstein
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: