Solr
  1. Solr
  2. SOLR-2112

Solrj should support streaming response

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: clients - java
    • Labels:
      None

      Description

      The solrj API should optionally support streaming documents.

      Rather then putting all results into a SolrDocumentList, sorlj should be able to call a callback function after each document is parsed. This would allow someone to call query.setRows( Integer.MAX_INT ) and get each result to the client without loading them all into memory.

      For starters, I think the important things to stream are SolrDocuments, but down the road, this could also stream other things (consider reading all terms from the index)

      1. SOLR-2112-StreamingSolrj.patch
        12 kB
        Ryan McKinley
      2. SOLR-2112-StreamingSolrj.patch
        17 kB
        Ryan McKinley
      3. SOLR-2112-StreamingSolrj.patch
        17 kB
        Ryan McKinley

        Activity

        Ryan McKinley created issue -
        Hide
        Ryan McKinley added a comment -

        Here is a patch to add streaming. It adds this top level function to solrServer:

          QueryResponse queryAndStreamResponse( SolrParams params, StreamingResponseCallback callback ) 
        
        public interface StreamingResponseCallback {
          public void documentRead( SolrDocument doc );
          public void documentListInfo( long numFound, long start, Float maxScore );
        }
        

        This is implemented by hacking the BinaryResponseWriter (embedded) and JavaBinCodec (http) to send events rather then write/read documents.

        Show
        Ryan McKinley added a comment - Here is a patch to add streaming. It adds this top level function to solrServer: QueryResponse queryAndStreamResponse( SolrParams params, StreamingResponseCallback callback ) public interface StreamingResponseCallback { public void documentRead( SolrDocument doc ); public void documentListInfo( long numFound, long start, Float maxScore ); } This is implemented by hacking the BinaryResponseWriter (embedded) and JavaBinCodec (http) to send events rather then write/read documents.
        Ryan McKinley made changes -
        Field Original Value New Value
        Attachment SOLR-2112-StreamingSolrj.patch [ 12454132 ]
        Hide
        Ryan McKinley added a comment -

        this patch has better comments and includes some missing files

        Show
        Ryan McKinley added a comment - this patch has better comments and includes some missing files
        Ryan McKinley made changes -
        Attachment SOLR-2112-StreamingSolrj.patch [ 12454135 ]
        Ryan McKinley made changes -
        Summary Solr should support streaming response Solrj should support streaming response
        Hide
        Ryan McKinley added a comment -

        I would like to commit this soon (just to /trunk) unless there are objections

        Show
        Ryan McKinley added a comment - I would like to commit this soon (just to /trunk) unless there are objections
        Hide
        Yonik Seeley added a comment -

        Can StreamingResponseCallback be an abstract class for easier back compat?
        I imagine we could want to stream other stuff in the future (output from terms component, facet component, term vector component, etc).

        Show
        Yonik Seeley added a comment - Can StreamingResponseCallback be an abstract class for easier back compat? I imagine we could want to stream other stuff in the future (output from terms component, facet component, term vector component, etc).
        Hide
        Ryan McKinley added a comment -

        ah yes, good point.

        Here is an updated patch using:

        public abstract class StreamingResponseCallback {
          /*
           * Called for each SolrDocument in the response
           */
          public abstract void streamSolrDocument( SolrDocument doc );
        
          /*
           * Called at the beginning of each DocList (and SolrDocumentList)
           */
          public abstract void streamDocListInfo( long numFound, long start, Float maxScore );
        }
        
        Show
        Ryan McKinley added a comment - ah yes, good point. Here is an updated patch using: public abstract class StreamingResponseCallback { /* * Called for each SolrDocument in the response */ public abstract void streamSolrDocument( SolrDocument doc ); /* * Called at the beginning of each DocList (and SolrDocumentList) */ public abstract void streamDocListInfo( long numFound, long start, Float maxScore ); }
        Ryan McKinley made changes -
        Attachment SOLR-2112-StreamingSolrj.patch [ 12454480 ]
        Hide
        Ryan McKinley added a comment -

        added in r996693

        I'm not sure what the 3.x release schedule looks like... so i'm not sure if back porting makes sense. I think keeping it on /trunk for a while makes sense till we know this is the API we want.

        Show
        Ryan McKinley added a comment - added in r996693 I'm not sure what the 3.x release schedule looks like... so i'm not sure if back porting makes sense. I think keeping it on /trunk for a while makes sense till we know this is the API we want.
        Ryan McKinley made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Uwe Schindler made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Mikhail Khludnev added a comment -

        fwiw,
        server side streaming is out-of scope of this issue. Some time ago I did a hack to allow streaming response during collecting results, if somebody need it, feel free to raise an issue. there is the code https://github.com/m-khl/solr-patches/compare/streaming

        Show
        Mikhail Khludnev added a comment - fwiw, server side streaming is out-of scope of this issue. Some time ago I did a hack to allow streaming response during collecting results, if somebody need it, feel free to raise an issue. there is the code https://github.com/m-khl/solr-patches/compare/streaming
        Hide
        Otis Gospodnetic added a comment -

        Mikhail Khludnev - are you using it somewhere? Does it work well? Is it "contributable"?

        Show
        Otis Gospodnetic added a comment - Mikhail Khludnev - are you using it somewhere? Does it work well? Is it "contributable"?
        Hide
        Mikhail Khludnev added a comment -

        I don't run it, just played with it a year ago up to passing distributed search test, see github. I'm ready to collaborate if anyone is interested. The most questionable thing is the overall design. To discuss it we need to have two separate issues, I suppose, for core ability and distributed support.
        IIRC there is a special component, which injects own delegating collector which writes into ServletResponse while search is going. It needs to require sort=docid&rows=0 (to avoid buffering results).
        Then for distributed search index should be presorted to keep internal and external ids monotonic. I didn't cover index sorting.
        But passing all these traps we've got yet another one MapReduce platform! Sounds cool isn't it!

        Show
        Mikhail Khludnev added a comment - I don't run it, just played with it a year ago up to passing distributed search test, see github. I'm ready to collaborate if anyone is interested. The most questionable thing is the overall design. To discuss it we need to have two separate issues, I suppose, for core ability and distributed support. IIRC there is a special component, which injects own delegating collector which writes into ServletResponse while search is going. It needs to require sort= docid &rows=0 (to avoid buffering results). Then for distributed search index should be presorted to keep internal and external ids monotonic. I didn't cover index sorting. But passing all these traps we've got yet another one MapReduce platform! Sounds cool isn't it!
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        5d 3h 19m 1 Ryan McKinley 13/Sep/10 22:32
        Resolved Resolved Closed Closed
        969d 13h 8m 1 Uwe Schindler 10/May/13 11:40

          People

          • Assignee:
            Unassigned
            Reporter:
            Ryan McKinley
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development