Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9559

Add ExecutorStream to execute stored Streaming Expressions

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Implemented
    • Affects Version/s: None
    • Fix Version/s: 6.3
    • Component/s: None
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      The ExecutorStream will wrap a stream which contains Tuples with Streaming Expressions to execute. By default the ExecutorStream will look for the expression in the expr_s field in the Tuples.

      The ExecutorStream will have an internal thread pool so expressions can be executed in parallel on a single worker. The ExecutorStream can also be wrapped by the parallel function to partition the Streaming Expressions that need to be executed across a cluster of worker nodes.

      Sample syntax:

      daemon(executor(threads=10, topic(storedExpressions, fl="expr_s", ...)))
      

      In the example above a daemon wraps an executor which wraps a topic that is reading stored Streaming Expressions. The daemon will call the executor at intervals which will execute all the expressions retrieved by the topic.

      1. SOLR-9559.patch
        8 kB
        Joel Bernstein
      2. SOLR-9559.patch
        15 kB
        Joel Bernstein
      3. SOLR-9559.patch
        20 kB
        Joel Bernstein
      4. SOLR-9559.patch
        21 kB
        Joel Bernstein

        Issue Links

          Activity

          Hide
          joel.bernstein Joel Bernstein added a comment -

          First implementation of the ExecutorStream

          Show
          joel.bernstein Joel Bernstein added a comment - First implementation of the ExecutorStream
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Added initial test case. A parallel test case is still needed.

          Show
          joel.bernstein Joel Bernstein added a comment - Added initial test case. A parallel test case is still needed.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Added parallel test case

          Show
          joel.bernstein Joel Bernstein added a comment - Added parallel test case
          Hide
          dsmiley David Smiley added a comment -

          Hi Joel. What do you think about naming this eval instead? That name seems more congruent with the name & purpose of the eval() method in various programming environments. You are very close to the code so I can see how ExecutorStream came to your mind in light of it using an ExecutorService underneath.

          I noticed that StreamTask loops over the tuples and does nothing with the result. Why is that? And might you use Java 7 try-with-resources over there?

          I admit I'm a little confused as to the use-case – why would someone embed a streaming expression be embedded in a tuple? Perhaps some sort of persistent distributed work queue? But then how are error conditions handled... do we concern ourselves with not running the same expression multiple times?

          Show
          dsmiley David Smiley added a comment - Hi Joel. What do you think about naming this eval instead? That name seems more congruent with the name & purpose of the eval() method in various programming environments. You are very close to the code so I can see how ExecutorStream came to your mind in light of it using an ExecutorService underneath. I noticed that StreamTask loops over the tuples and does nothing with the result. Why is that? And might you use Java 7 try-with-resources over there? I admit I'm a little confused as to the use-case – why would someone embed a streaming expression be embedded in a tuple? Perhaps some sort of persistent distributed work queue? But then how are error conditions handled... do we concern ourselves with not running the same expression multiple times?
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          All interesting questions.

          I thought about exec and eval but settled on executor because it really is a work queue for streaming expressions. It's a really powerful executor because it's parallel on a single node and can be parallelized across a cluster of worker nodes by wrapping it in the parallel function.

          The StreamTask's job is to iterate the stream. All functionality in streaming expressions is achieved by iterating the stream. In order for something interesting to happen in this scenario you would need to use a stream decorator that pushes data somewhere, such as the update() function. The update function pushes Tuples to another SolrCloud collection.

          For example the executor could be used to train millions of machine learning models and store the models in a SolrCloud collection.

          There are three core use cases for this:

          1) As part of a scalable framework for developing Actor Model systems https://en.wikipedia.org/wiki/Actor_model. The Actor Model is one of core features of Scala and Erlang. The daemon function can be used to construct Actors that interact with each other through work queues and mail boxes.

          2) Massively scalable stored queries and alerts. See the topic function for more details on subscribing to a query.

          3) A general purpose parallel executor / work queue.

          Error handling currently is just logging errors. But there is a lot we can do with error handling as this matures. One of the really nice things about the topic() function is that it persists it's checkpoints in a collection. If you run a job that uses a topic() and it fails in the middle, you can simply start it back up and it picks up where it left off.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited All interesting questions. I thought about exec and eval but settled on executor because it really is a work queue for streaming expressions. It's a really powerful executor because it's parallel on a single node and can be parallelized across a cluster of worker nodes by wrapping it in the parallel function. The StreamTask's job is to iterate the stream. All functionality in streaming expressions is achieved by iterating the stream. In order for something interesting to happen in this scenario you would need to use a stream decorator that pushes data somewhere, such as the update() function. The update function pushes Tuples to another SolrCloud collection. For example the executor could be used to train millions of machine learning models and store the models in a SolrCloud collection. There are three core use cases for this: 1) As part of a scalable framework for developing Actor Model systems https://en.wikipedia.org/wiki/Actor_model . The Actor Model is one of core features of Scala and Erlang. The daemon function can be used to construct Actors that interact with each other through work queues and mail boxes. 2) Massively scalable stored queries and alerts. See the topic function for more details on subscribing to a query. 3) A general purpose parallel executor / work queue. Error handling currently is just logging errors. But there is a lot we can do with error handling as this matures. One of the really nice things about the topic() function is that it persists it's checkpoints in a collection. If you run a job that uses a topic() and it fails in the middle, you can simply start it back up and it picks up where it left off.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 727bf559a0089d67ddd7eb5ed572f79b67a006c6 in lucene-solr's branch refs/heads/master from Joel Bernstein
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=727bf55 ]

          SOLR-9559: Add ExecutorStream to execute stored Streaming Expressions

          Show
          jira-bot ASF subversion and git services added a comment - Commit 727bf559a0089d67ddd7eb5ed572f79b67a006c6 in lucene-solr's branch refs/heads/master from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=727bf55 ] SOLR-9559 : Add ExecutorStream to execute stored Streaming Expressions
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 67938c2bab9011e1049763368897645a1bf9209f in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=67938c2 ]

          SOLR-9559: Add ExecutorStream to execute stored Streaming Expressions

          Show
          jira-bot ASF subversion and git services added a comment - Commit 67938c2bab9011e1049763368897645a1bf9209f in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=67938c2 ] SOLR-9559 : Add ExecutorStream to execute stored Streaming Expressions
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 7b3d29dda77404b9d2772c0df4bc2fd4d600ed5e in lucene-solr's branch refs/heads/master from Joel Bernstein
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7b3d29d ]

          SOLR-9533, SOLR-9559: Undate CHANGES.txt

          Show
          jira-bot ASF subversion and git services added a comment - Commit 7b3d29dda77404b9d2772c0df4bc2fd4d600ed5e in lucene-solr's branch refs/heads/master from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7b3d29d ] SOLR-9533 , SOLR-9559 : Undate CHANGES.txt
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 0cadbb995bba0743a201980a7fdc6902dc16c4bc in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0cadbb9 ]

          SOLR-9533, SOLR-9559: Undate CHANGES.txt

          Show
          jira-bot ASF subversion and git services added a comment - Commit 0cadbb995bba0743a201980a7fdc6902dc16c4bc in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0cadbb9 ] SOLR-9533 , SOLR-9559 : Undate CHANGES.txt
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          Closing after 6.3.0 release.

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - Closing after 6.3.0 release.

            People

            • Assignee:
              joel.bernstein Joel Bernstein
              Reporter:
              joel.bernstein Joel Bernstein
            • Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development