Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Resolved
    • Affects Version/s: None
    • Fix Version/s: 6.4, 7.0
    • Component/s: None
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      SOLR-9559 adds a general purpose parallel task executor for streaming expressions. The executor() function executes a stream of tasks and doesn't have any concept of task priority.

      The priority() function wraps two streams, a high priority stream and a low priority stream. The priority function emits tuples from the high priority stream first, and then the low priority stream.

      The executor() function can then wrap the priority function to see tasks in priority order.

      Pseudo syntax:

      daemon(executor(priority(topic(tasks, q="priority:high"), topic(tasks, q="priority:low"))))
      
      1. SOLR-9684.patch
        15 kB
        Joel Bernstein
      2. SOLR-9684.patch
        6 kB
        Joel Bernstein
      3. SOLR-9684.patch
        6 kB
        Joel Bernstein

        Issue Links

          Activity

          Hide
          joel.bernstein Joel Bernstein added a comment -

          New patch with tests

          Show
          joel.bernstein Joel Bernstein added a comment - New patch with tests
          Hide
          dsmiley David Smiley added a comment -

          When I saw the title of this issue, I thought this was something quite different than what it was – I thought this was about executing something (or emitting tuples) at a certain time or in a periodic fashion.

          We've already got a merge() streaming expression that seems remarkably close to this... the only difference here is favoring one stream's tuples over another. Maybe you could call the feature here mergePrioritized or something like that?

          Show
          dsmiley David Smiley added a comment - When I saw the title of this issue, I thought this was something quite different than what it was – I thought this was about executing something (or emitting tuples) at a certain time or in a periodic fashion. We've already got a merge() streaming expression that seems remarkably close to this... the only difference here is favoring one stream's tuples over another. Maybe you could call the feature here mergePrioritized or something like that?
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f3fe487970f1e21300bd556d226461a2d51b00f9 in lucene-solr's branch refs/heads/master from Joel Bernstein
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f3fe487 ]

          SOLR-9684: Add schedule Streaming Expression

          Show
          jira-bot ASF subversion and git services added a comment - Commit f3fe487970f1e21300bd556d226461a2d51b00f9 in lucene-solr's branch refs/heads/master from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f3fe487 ] SOLR-9684 : Add schedule Streaming Expression
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit be119d2aa082e176c88dd72c674dbd406d5ec9a2 in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=be119d2 ]

          SOLR-9684: Add schedule Streaming Expression

          Show
          jira-bot ASF subversion and git services added a comment - Commit be119d2aa082e176c88dd72c674dbd406d5ec9a2 in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=be119d2 ] SOLR-9684 : Add schedule Streaming Expression
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 36a691c50d680d1c6977e6185448e06cb21f653d in lucene-solr's branch refs/heads/master from Joel Bernstein
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=36a691c ]

          SOLR-9684: Update CHANGES.txt

          Show
          jira-bot ASF subversion and git services added a comment - Commit 36a691c50d680d1c6977e6185448e06cb21f653d in lucene-solr's branch refs/heads/master from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=36a691c ] SOLR-9684 : Update CHANGES.txt
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d396f2d81e8ff52e65a8c2743ec3d4cafca13bc5 in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d396f2d ]

          SOLR-9684: Update CHANGES.txt

          Show
          jira-bot ASF subversion and git services added a comment - Commit d396f2d81e8ff52e65a8c2743ec3d4cafca13bc5 in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d396f2d ] SOLR-9684 : Update CHANGES.txt
          Hide
          dsmiley David Smiley added a comment -

          Joel Bernstein did you not see my feedback?

          Show
          dsmiley David Smiley added a comment - Joel Bernstein did you not see my feedback?
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Ah, missed it. Reading now..

          Show
          joel.bernstein Joel Bernstein added a comment - Ah, missed it. Reading now..
          Hide
          joel.bernstein Joel Bernstein added a comment -

          We can think about the naming of this some more.

          The reason why I called it 'schedule' is that it schedules higher priority tasks ahead of lower priority tasks. Possibly more scheduling features could be added in the future.

          Show
          joel.bernstein Joel Bernstein added a comment - We can think about the naming of this some more. The reason why I called it 'schedule' is that it schedules higher priority tasks ahead of lower priority tasks. Possibly more scheduling features could be added in the future.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          One of the things I think would be interesting would be to include a cost based scheduler, which we could build into this implementation.

          Each expression implements a cost() method which could be used to determine which tasks to schedule together. But this is going to take more thought and probably involve walking the parse tree to find which collections are involved in the expression.

          Currently also the cost() method is not implemented so we'd have to put some thought into how expressions calculate cost. Fairly soon we'll have to calculate cost for many expressions to support the Calcite cost based join optimizer.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited One of the things I think would be interesting would be to include a cost based scheduler, which we could build into this implementation. Each expression implements a cost() method which could be used to determine which tasks to schedule together. But this is going to take more thought and probably involve walking the parse tree to find which collections are involved in the expression. Currently also the cost() method is not implemented so we'd have to put some thought into how expressions calculate cost. Fairly soon we'll have to calculate cost for many expressions to support the Calcite cost based join optimizer.
          Hide
          dsmiley David Smiley added a comment -

          Using cost() for this sorta thing sounds great... then you could decorate a stream if you want to fix the cost, and then merge() could perhaps use cost. In any case, I really don't like the name "schedule" for this stream.

          Show
          dsmiley David Smiley added a comment - Using cost() for this sorta thing sounds great... then you could decorate a stream if you want to fix the cost, and then merge() could perhaps use cost. In any case, I really don't like the name "schedule" for this stream.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          It think it makes sense for the executor to wrap a scheduler. The semantics of this is nice. We can also use the schedule function as a facade to build out more scheduling capabilities by passing a scheduling algorithm. for example:

          executor(schedule(COST, topic())))
          executor(schedule(CRON, search())))
          executor(schedule(PRIORITY, topic(), topic())))

          The initial release is simple, but a nice first step.

          Show
          joel.bernstein Joel Bernstein added a comment - It think it makes sense for the executor to wrap a scheduler . The semantics of this is nice. We can also use the schedule function as a facade to build out more scheduling capabilities by passing a scheduling algorithm. for example: executor(schedule(COST, topic()))) executor(schedule(CRON, search()))) executor(schedule(PRIORITY, topic(), topic()))) The initial release is simple, but a nice first step.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          We could also consider naming the expression 'priority':

          executor(priority(topic(), topic())
          

          I'll reopen the ticket until this is decided.

          Show
          joel.bernstein Joel Bernstein added a comment - We could also consider naming the expression 'priority': executor(priority(topic(), topic()) I'll reopen the ticket until this is decided.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Re-opening to possibly change the expression name.

          Show
          joel.bernstein Joel Bernstein added a comment - Re-opening to possibly change the expression name.
          Hide
          dsmiley David Smiley added a comment -

          "priority" is way better than "schedule" IMO.

          (quoting me) We've already got a merge() streaming expression that seems remarkably close to this... the only difference here is favoring one stream's tuples over another. Maybe you could call the feature here mergePrioritized or something like that?

          What do you think of my statement there? Is it at least similar conceptually to merge()? Then shouldn't it be named similarly? No matter what name is chosen, the docs for merge() should point to the one created in this issue as it's awfully similar, even if the code might be fairly different.

          Show
          dsmiley David Smiley added a comment - "priority" is way better than "schedule" IMO. (quoting me) We've already got a merge() streaming expression that seems remarkably close to this... the only difference here is favoring one stream's tuples over another. Maybe you could call the feature here mergePrioritized or something like that? What do you think of my statement there? Is it at least similar conceptually to merge()? Then shouldn't it be named similarly? No matter what name is chosen, the docs for merge() should point to the one created in this issue as it's awfully similar, even if the code might be fairly different.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Ok, then let's go with priority as the name for this function.

          About the merge function. The merge function is shorthand for "mergeSort". It's designed to merge two streams sorted on the same keys and maintain the sort order. Originally the idea was that the /export handler was a giant sorting engine, and merge was a way to efficiently merge the sorted streams.

          The priority function behaves more like the SQL UNIONALL. But it's different in that priority only picks one stream to iterate on each open/close. This design allows it to iterate the high priority topic, and only iterate the lower priority topic when no new higher priority tasks have entered the index. Because topics work in small batches, new high priority tasks will jump ahead of existing lower priority tasks on each executor run.

          Also the merge function I think fits into the relational algebra category. The priority function is mainly going to be used for task prioritization and execution.

          Eventually we'll need to implement both a UnionStream and UnionAllStream as well.

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Ok, then let's go with priority as the name for this function. About the merge function. The merge function is shorthand for "mergeSort". It's designed to merge two streams sorted on the same keys and maintain the sort order. Originally the idea was that the /export handler was a giant sorting engine, and merge was a way to efficiently merge the sorted streams. The priority function behaves more like the SQL UNIONALL. But it's different in that priority only picks one stream to iterate on each open/close. This design allows it to iterate the high priority topic, and only iterate the lower priority topic when no new higher priority tasks have entered the index. Because topics work in small batches, new high priority tasks will jump ahead of existing lower priority tasks on each executor run. Also the merge function I think fits into the relational algebra category. The priority function is mainly going to be used for task prioritization and execution. Eventually we'll need to implement both a UnionStream and UnionAllStream as well.
          Hide
          dsmiley David Smiley added a comment -

          +1 good, priority it is.

          Too bad merge isn't named mergeSorted which would have been way clearer.

          Show
          dsmiley David Smiley added a comment - +1 good, priority it is. Too bad merge isn't named mergeSorted which would have been way clearer.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 0999f6779a3341af072d31162a2c88cf1eb8c5d4 in lucene-solr's branch refs/heads/master from Joel Bernstein
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0999f67 ]

          SOLR-9684: Rename schedule function to priority

          Show
          jira-bot ASF subversion and git services added a comment - Commit 0999f6779a3341af072d31162a2c88cf1eb8c5d4 in lucene-solr's branch refs/heads/master from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0999f67 ] SOLR-9684 : Rename schedule function to priority
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit dc289bdacf1a5731839132d6aa019b9e23122031 in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=dc289bd ]

          SOLR-9684: Rename schedule function to priority

          Show
          jira-bot ASF subversion and git services added a comment - Commit dc289bdacf1a5731839132d6aa019b9e23122031 in lucene-solr's branch refs/heads/branch_6x from Joel Bernstein [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=dc289bd ] SOLR-9684 : Rename schedule function to priority

            People

            • Assignee:
              joel.bernstein Joel Bernstein
              Reporter:
              joel.bernstein Joel Bernstein
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development