Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8550

Add asynchronous DaemonStreams to the Streaming API

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 6.0
    • Fix Version/s: 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently all streams in the Streaming API are synchronously pulled by a client.

      It would be great to add the capability to have asyncronous DaemonStreams that live within Solr that can push content as well. This would facilite large scale alerting and background aggregation use cases.

      1. SOLR-8550.patch
        32 kB
        Joel Bernstein
      2. SOLR-8550.patch
        30 kB
        Joel Bernstein
      3. SOLR-8550.patch
        29 kB
        Joel Bernstein
      4. SOLR-8550.patch
        28 kB
        Joel Bernstein
      5. SOLR-8550.patch
        22 kB
        Joel Bernstein
      6. SOLR-8550.patch
        20 kB
        Joel Bernstein
      7. SOLR-8550.patch
        16 kB
        Joel Bernstein
      8. SOLR-8550.patch
        12 kB
        Joel Bernstein
      9. SOLR-8550.patch
        12 kB
        Joel Bernstein
      10. SOLR-8550.patch
        10 kB
        Joel Bernstein
      11. SOLR-8550.patch
        6 kB
        Joel Bernstein
      12. SOLR-8550.patch
        6 kB
        Joel Bernstein
      13. SOLR-8550.patch
        6 kB
        Joel Bernstein
      14. SOLR-8550.patch
        5 kB
        Joel Bernstein
      15. SOLR-8550.patch
        5 kB
        Joel Bernstein
      16. SOLR-8550.patch
        5 kB
        Joel Bernstein
      17. SOLR-8550.patch
        4 kB
        Joel Bernstein

        Issue Links

          Activity

          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          The general design is to add a new DaemonStream which will be handled differently by the /stream handler. When the /stream handler sees an DaemonStream it will open it and just keep it around in a memory. The DaemonStream will have a thread that wakes up periodically and opens, reads, and closes it's underlying stream. Syntax would look like this:

          daemon(alert())
          

          The AlertStream would be a new stream created in a different ticket.

          Parallel async streams should work fine as well, facilitating very large scale alerting systems:

          parallel(daemon(alert()))
          

          In the parallel example the DaemonStream would be pushed to worker nodes where they would live.

          An example of background aggregation:

          parallel(daemon(update(rollup(...))))
          
          Show
          joel.bernstein Joel Bernstein added a comment - - edited The general design is to add a new DaemonStream which will be handled differently by the /stream handler. When the /stream handler sees an DaemonStream it will open it and just keep it around in a memory. The DaemonStream will have a thread that wakes up periodically and opens, reads, and closes it's underlying stream. Syntax would look like this: daemon(alert()) The AlertStream would be a new stream created in a different ticket. Parallel async streams should work fine as well, facilitating very large scale alerting systems: parallel(daemon(alert())) In the parallel example the DaemonStream would be pushed to worker nodes where they would live. An example of background aggregation: parallel(daemon(update(rollup(...))))
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          First attempt at the DaemonStream. No tests yet.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited First attempt at the DaemonStream. No tests yet.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Added an Info Tuple to report stats kept by the DaemonStream

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Added an Info Tuple to report stats kept by the DaemonStream
          Hide
          risdenk Kevin Risden added a comment -

          Any thoughts about how a daemon stream be stopped/started/cancelled/removed?

          Show
          risdenk Kevin Risden added a comment - Any thoughts about how a daemon stream be stopped/started/cancelled/removed?
          Hide
          joel.bernstein Joel Bernstein added a comment -

          I was thinking of adding simple admin commands to the /stream handler (list, stop, start, remove etc..)

          The DeamonStreams are also meant to be held onto outside of Solr by another application that wants continous streaming.

          Show
          joel.bernstein Joel Bernstein added a comment - I was thinking of adding simple admin commands to the /stream handler (list, stop, start, remove etc..) The DeamonStreams are also meant to be held onto outside of Solr by another application that wants continous streaming.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          To give you an idea of where this is eventually headed, imagine a system with Daemons that are building models:

          daemon(update(logit()))
          

          And then other Daemons that are alerting based on those models:

          daemon(alert())
          

          This is basically an AI driven alerting system.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited To give you an idea of where this is eventually headed, imagine a system with Daemons that are building models: daemon(update(logit())) And then other Daemons that are alerting based on those models: daemon(alert()) This is basically an AI driven alerting system.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          New patch begins the work on the /stream handler. Initial patch just starts the DaemonStream and puts it in a map. A Tuple is returned which includes the core where the Daemon started. In parallel mode this will return a Tuple from each worker the Daemon was started on.

          Show
          joel.bernstein Joel Bernstein added a comment - New patch begins the work on the /stream handler. Initial patch just starts the DaemonStream and puts it in a map. A Tuple is returned which includes the core where the Daemon started. In parallel mode this will return a Tuple from each worker the Daemon was started on.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Added redimentary API for stoping, starting and listing DaemonStreams on the /stream handler

          Show
          joel.bernstein Joel Bernstein added a comment - Added redimentary API for stoping, starting and listing DaemonStreams on the /stream handler
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Added Thread state to info Tuple.

          I think the next step is to start adding tests.

          Show
          joel.bernstein Joel Bernstein added a comment - Added Thread state to info Tuple. I think the next step is to start adding tests.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          New patch making the DaemonStream Expressible

          Show
          joel.bernstein Joel Bernstein added a comment - New patch making the DaemonStream Expressible
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Added a simple test that iterates 10 times over a rollup. The data doesn't change underneath so the values are the same each time. This tests the internal thread and queue setup of the DaemonStream. It also demonstrates how a DaemonStream.read() blocks until there is new data in the queue populated by the underlying Stream.

          The RollupStream needed to be changed so that it resets it's state upon closing to support calling open() and close() multiple times on the same stream. Any Stream wrapped by the DaemonStream will need to do the same.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Added a simple test that iterates 10 times over a rollup. The data doesn't change underneath so the values are the same each time. This tests the internal thread and queue setup of the DaemonStream. It also demonstrates how a DaemonStream.read() blocks until there is new data in the queue populated by the underlying Stream. The RollupStream needed to be changed so that it resets it's state upon closing to support calling open() and close() multiple times on the same stream. Any Stream wrapped by the DaemonStream will need to do the same.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Expanded the tests to show that the rollup count changed after a new document was added to the index. The daemon stream was left open and the changed rollup count was streamed out after the new document was indexed.

          This tests the main functionality of the DaemonStream.

          The next step is to test the DaemonStream running inside of the /stream handler.

          Show
          joel.bernstein Joel Bernstein added a comment - Expanded the tests to show that the rollup count changed after a new document was added to the index. The daemon stream was left open and the changed rollup count was streamed out after the new document was indexed. This tests the main functionality of the DaemonStream. The next step is to test the DaemonStream running inside of the /stream handler.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          New patch that tests parallel DaemonStream's, running inside of the /stream handler, updating a collection. Still need to expand the tests to include the different actions that are supported.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited New patch that tests parallel DaemonStream's, running inside of the /stream handler, updating a collection. Still need to expand the tests to include the different actions that are supported.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          New patch tests the list action.

          I think the next step is manual testing. If that looks good I think this ready to commit.

          Show
          joel.bernstein Joel Bernstein added a comment - New patch tests the list action. I think the next step is manual testing. If that looks good I think this ready to commit.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Added StreamExpressionToExpression test

          Show
          joel.bernstein Joel Bernstein added a comment - Added StreamExpressionToExpression test
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Added error logging and a test that ensures the DaemonStream thread has been stopped when using the stop action. This also exercises the list action.

          These tests are passing every time now locally and a whole lot of things have to go right for these tests to pass. I'm feeling pretty good about committing this.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Added error logging and a test that ensures the DaemonStream thread has been stopped when using the stop action. This also exercises the list action. These tests are passing every time now locally and a whole lot of things have to go right for these tests to pass. I'm feeling pretty good about committing this.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1726291 from Joel Bernstein in branch 'dev/trunk'
          [ https://svn.apache.org/r1726291 ]

          SOLR-8550: Add asynchronous DaemonStreams to the Streaming API

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1726291 from Joel Bernstein in branch 'dev/trunk' [ https://svn.apache.org/r1726291 ] SOLR-8550 : Add asynchronous DaemonStreams to the Streaming API

            People

            • Assignee:
              Unassigned
              Reporter:
              joel.bernstein Joel Bernstein
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development