Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-10351

Add analyze Stream Evaluator to support streaming NLP

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Resolved
    • None
    • 6.6, 7.0
    • None

    Description

      The analyze Stream Evaluator uses a Solr analyzer to return a collection of tokens from a text field. The collection of tokens can then be streamed out by the cartesianProduct Streaming Expression or attached to documents as multi-valued fields by the select Streaming Expression.

      This allows Streaming Expressions to leverage all the existing tokenizers and filters and provides a place for future NLP analyzers to be added to Streaming Expressions.

      Sample syntax:

      cartesianProduct(expr, analyze(analyzerField, textField) as outfield )
      
      select(expr, analyze(analyzerField, textField) as outfield )
      

      Combined with Solr's batch text processing capabilities this provides an entire parallel NLP framework. Solr's batch processing capabilities are described here:

      Batch jobs, Parallel ETL and Streaming Text Transformation
      http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html

      Attachments

        1. SOLR-10351.patch
          9 kB
          Joel Bernstein
        2. SOLR-10351.patch
          18 kB
          Joel Bernstein
        3. SOLR-10351.patch
          21 kB
          Joel Bernstein
        4. SOLR-10351.patch
          21 kB
          Joel Bernstein

        Activity

          People

            jbernste Joel Bernstein
            jbernste Joel Bernstein
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: