Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8588

Add TopicStream to the streaming API to support publish/subscribe messaging

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Implemented
    • Affects Version/s: None
    • Fix Version/s: 6.0
    • Component/s: None
    • Labels:
      None

      Description

      The TopicStream is a publish/subscribe messaging service built on top of SolrCloud. The TopicStream returns all new documents for a specific query. Version numbers will be used as checkpoints for Topics to ensure single delivery of each document. When combined with the DaemonStream (SOLR-8550), Topics can provide continuous streaming. Sample syntax:

      topic(checkpointCollection, dataCollection, id="topicA",  q="awesome stuff" checkpointEvery="1000")
      

      The checkpoint collection will be used to persist the topic checkpoints.

      Example combined with the DaemonStream:

      daemon(topic(...)...)
      

      When combined with SOLR-7739 this allows for messaging based on machine learned classifications.

      The TopicStream supports 3 models of publish/subscribe messaging:

      1) Request & response: In this model a topic(...) expression can be saved and executed at any time. In this scenario the TopicStream will always retrieve it's checkpoints and start from where it left off.

      2) Continuous pull streaming: In this model you would wrap the TopicStream in a DaemonStream and call read() in a loop inside a java program. This would provide a continuous stream of new content as it arrives in the index.

      3) Continuous push streaming: In this model you would send an expression like this to the /stream handler: daemon(update(topic(...)...)...). This daemon process would run inside Solr and continuously stream new documents from the topic and push them to another SolrCloud collection. Other pushing expressions can be created to push documents in different ways or take other types of actions.

      1. SOLR-8588.patch
        29 kB
        Joel Bernstein
      2. SOLR-8588.patch
        26 kB
        Joel Bernstein
      3. SOLR-8588.patch
        26 kB
        Joel Bernstein
      4. SOLR-8588.patch
        25 kB
        Joel Bernstein
      5. SOLR-8588.patch
        16 kB
        Joel Bernstein

        Issue Links

          Activity

          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          I think this ticket is the one I'm most excited about at the moment because it leverages so many of Solr's strengths. Topics can be arbitrary queries so they don't have to be registered in advance. It leverages Solr's transaction log and version number. SolrCloud replication provides scale and redundancy. Combined with DaemonStreams, topics can "live" inside Solr and continually push data, or can be embedded in client apps to provide continuous streaming.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited I think this ticket is the one I'm most excited about at the moment because it leverages so many of Solr's strengths. Topics can be arbitrary queries so they don't have to be registered in advance. It leverages Solr's transaction log and version number. SolrCloud replication provides scale and redundancy. Combined with DaemonStreams, topics can "live" inside Solr and continually push data, or can be embedded in client apps to provide continuous streaming.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          First pass at the TopicStream implementation. Tests to follow.

          Show
          joel.bernstein Joel Bernstein added a comment - First pass at the TopicStream implementation. Tests to follow.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Patch with first set of working tests.

          Show
          joel.bernstein Joel Bernstein added a comment - Patch with first set of working tests.
          Hide
          risdenk Kevin Risden added a comment -

          This code could be cleaner I think:

          Collections.shuffle(shuffler, new Random());
          Replica rep = shuffler.get(0);
          

          replace with something like:

          Random random = new Random();
          Replica rep = shuffler.get(random.nextInt(shuffler.size()));
          

          The initialization of random could be done once I think.

          Show
          risdenk Kevin Risden added a comment - This code could be cleaner I think: Collections.shuffle(shuffler, new Random()); Replica rep = shuffler.get(0); replace with something like: Random random = new Random(); Replica rep = shuffler.get(random.nextInt(shuffler.size())); The initialization of random could be done once I think.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Yep, this is nicer. I'll include this in the next patch.

          Show
          joel.bernstein Joel Bernstein added a comment - Yep, this is nicer. I'll include this in the next patch.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Added simple test for daemon(topic(...)....) continuous streaming.

          Show
          joel.bernstein Joel Bernstein added a comment - Added simple test for daemon(topic(...)....) continuous streaming.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Added tests for checkpointing during read() iteration.

          Show
          joel.bernstein Joel Bernstein added a comment - Added tests for checkpointing during read() iteration.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Next step is manual testing.

          Show
          joel.bernstein Joel Bernstein added a comment - Next step is manual testing.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Manual testing looks good. It turned up a connection leak, which is now fixed (I'll put up a new patch shortly). Just need to clean up a few things and prepare this for committal for 6.0.

          Show
          joel.bernstein Joel Bernstein added a comment - Manual testing looks good. It turned up a connection leak, which is now fixed (I'll put up a new patch shortly). Just need to clean up a few things and prepare this for committal for 6.0.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b2475bf9fdc59c02454f730a6cc4916cff03f862 in lucene-solr's branch refs/heads/master from jbernste
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b2475bf ]

          SOLR-8588: Add TopicStream to the streaming API to support publish/subscribe messaging

          Show
          jira-bot ASF subversion and git services added a comment - Commit b2475bf9fdc59c02454f730a6cc4916cff03f862 in lucene-solr's branch refs/heads/master from jbernste [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b2475bf ] SOLR-8588 : Add TopicStream to the streaming API to support publish/subscribe messaging
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f9127a919ac212c4a5c36e66fb0d0c15a7867c0e in lucene-solr's branch refs/heads/master from jbernste
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f9127a9 ]

          SOLR-8588: Update CHANGES.txt

          Show
          jira-bot ASF subversion and git services added a comment - Commit f9127a919ac212c4a5c36e66fb0d0c15a7867c0e in lucene-solr's branch refs/heads/master from jbernste [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f9127a9 ] SOLR-8588 : Update CHANGES.txt

            People

            • Assignee:
              joel.bernstein Joel Bernstein
              Reporter:
              joel.bernstein Joel Bernstein
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development