Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13836

Streaming Expression Query Parser

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      It is currently possible to hit the search handler in a streaming expression ("search(...)"), but it is not currently possible to invoke a streaming expression from within a regular search within the search handler. In some cases, it would be useful to leverage the power of streaming expressions to generate a result set and then join that result set with a normal set of search results.

      This isn't expected to be particularly efficient for high cardinality streaming expression results, but it would be pretty powerful feature that could enable a bunch of use cases that aren't possible today within a normal search.

      Example:

      Docs:

      curl -X POST -H "Content-Type: application/json" http://localhost:8983/solr/food_collection/update?commit=true  --data-binary '
      [
      {"id": "1", "name_s":"donut","vector_fs":[5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0]},
      {"id": "2", "name_s":"apple juice","vector_fs":[1.0,5.0,0.0,0.0,0.0,4.0,4.0,3.0]},
      {"id": "3", "name_s":"cappuccino","vector_fs":[0.0,5.0,3.0,0.0,4.0,1.0,2.0,3.0]},
      {"id": "4", "name_s":"cheese pizza","vector_fs":[5.0,0.0,4.0,4.0,0.0,1.0,5.0,2.0]},
      {"id": "5", "name_s":"green tea","vector_fs":[0.0,5.0,0.0,0.0,2.0,1.0,1.0,5.0]},
      {"id": "6", "name_s":"latte","vector_fs":[0.0,5.0,4.0,0.0,4.0,1.0,3.0,3.0]},
      {"id": "7", "name_s":"soda","vector_fs":[0.0,5.0,0.0,0.0,3.0,5.0,5.0,0.0]},
      {"id": "8", "name_s":"cheese bread sticks","vector_fs":[5.0,0.0,4.0,5.0,0.0,1.0,4.0,2.0]},
      {"id": "9", "name_s":"water","vector_fs":[0.0,5.0,0.0,0.0,0.0,0.0,0.0,5.0]},
      {"id": "10", "name_s":"cinnamon bread sticks","vector_fs":[5.0,0.0,1.0,5.0,0.0,3.0,4.0,2.0]}
      ]
      

       

      Query:

      http://localhost:8983/solr/food/select?q=*:*&fq=\{!streaming_expression}top(select(search(food,%20q=%22*:*%22,%20fl=%22id,vector_fs%22,%20sort=%22id%20asc%22),%20cosineSimilarity(vector_fs,%20array(5.1,0.0,1.0,5.0,0.0,4.0,5.0,1.0))%20as%20cos,%20id),%20n=5,%20sort=%22cos%20desc%22)&fl=id,name_s
      

       

      Response:

      {
        "responseHeader":{
          "zkConnected":true,
          "status":0,
          "QTime":7,
          "params":{
            "q":"*:*",
            "fl":"id,name_s",
            "fq":"{!streaming_expression}top(select(search(food, q=\"*:*\", fl=\"id,vector_fs\", sort=\"id asc\"), cosineSimilarity(vector_fs, array(5.2,0.0,1.0,5.0,0.0,4.0,5.0,1.0)) as cos, id), n=5, sort=\"cos desc\")"}},
        "response":{"numFound":5,"start":0,"docs":[
            {
              "name_s":"donut",
              "id":"1"},
            {
              "name_s":"apple juice",
              "id":"2"},
            {
              "name_s":"cheese pizza",
              "id":"4"},
            {
              "name_s":"cheese bread sticks",
              "id":"8"},
            {
              "name_s":"cinnamon bread sticks",
              "id":"10"}]
        }}
      

      The current implementation also supports the following additional parameters:
      f: (optional) The field name from the streaming expression containing the document ids upon which to filter. Defaults to the same uniqueKey field name from your documents.
      method: (optional) Any of termsFilter (default), booleanQuery, automaton, docValuesTermsFilter.

      The method may go away, especially if we find a more efficient way to join the stream to the main query doc set.
       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              solrtrey Trey Grainger
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10m
                  10m