Details

    • Type: Task Task
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.6
    • Fix Version/s: 0.7
    • Labels:
      None

      Description

      We would like to use runtime isolation. PEs from the same APP can split to several sub-sets and run in different JVM processes, so that we might get some benefit from it, such as isolating the PEs which cost a lot of resources, easier to find out the PE with problem.

        Activity

        Hide
        Matthieu Morel added a comment -

        In other words, you want exclusive execution of a subset of PEs in a given partition right?

        In my opinion, the cleanest way to achieve this is to extend the event dispatching mechanism. You could restrict some partitions to only receive events from some specified streams for instance. Then there should be a way to specify the assignation of streams/prototypes to partitions. And a way to check that (through the S4 status command?)

        Show
        Matthieu Morel added a comment - In other words, you want exclusive execution of a subset of PEs in a given partition right? In my opinion, the cleanest way to achieve this is to extend the event dispatching mechanism. You could restrict some partitions to only receive events from some specified streams for instance. Then there should be a way to specify the assignation of streams/prototypes to partitions. And a way to check that (through the S4 status command?)
        Hide
        Matthieu Morel added a comment -

        Note that this was also discussed some time ago: http://mail-archives.apache.org/mod_mbox/incubator-s4-dev/201204.mbox/thread and http://mail-archives.apache.org/mod_mbox/incubator-s4-dev/201205.mbox/thread

        To summarize, there are 2 designs for achieving isolation:

        1. using different apps and interconnecting them. This is easy with the dynamic registration mechanism for external streams in 0.5
        2. controlling node allocation on a PE basis. This is more fine grained and requires development. Hence this ticket.
        Show
        Matthieu Morel added a comment - Note that this was also discussed some time ago: http://mail-archives.apache.org/mod_mbox/incubator-s4-dev/201204.mbox/thread and http://mail-archives.apache.org/mod_mbox/incubator-s4-dev/201205.mbox/thread To summarize, there are 2 designs for achieving isolation: using different apps and interconnecting them. This is easy with the dynamic registration mechanism for external streams in 0.5 controlling node allocation on a PE basis. This is more fine grained and requires development. Hence this ticket.
        Hide
        Aimee Cheng added a comment -

        I think design 2 is a better choice. From a user's perspective, they may prefer a simple deployment.

        So providing an flexible way is important, and we also have some discusses about that in mail.

        1. allow users can control the PE partition.
        Because PEs have different requirement of CPU or Memory resources, allow user to set different node allocations for them is a good way to avoid a potential performance bottleneck. Under most situations, users may not care about which specific partition to run exclusive PE, instead, they only care about use how many partitions to run their PE. So in our opinion, given a total number of partitions is more practical.

        2. exclusive PE allocation
        we totally agree the "exclusive allocation" mode,so that some CPU/Memory sensitive PEs can have their own processes without effect on the others, also it may provide an easy way for finding root cause PE when met issues. But we also want to know if it is practicable to provide a "non-exclusive allocation" mode. The implement seems will turn on to the ratio between partition numbers at last.

        Show
        Aimee Cheng added a comment - I think design 2 is a better choice. From a user's perspective, they may prefer a simple deployment. So providing an flexible way is important, and we also have some discusses about that in mail. 1. allow users can control the PE partition. Because PEs have different requirement of CPU or Memory resources, allow user to set different node allocations for them is a good way to avoid a potential performance bottleneck. Under most situations, users may not care about which specific partition to run exclusive PE, instead, they only care about use how many partitions to run their PE. So in our opinion, given a total number of partitions is more practical. 2. exclusive PE allocation we totally agree the "exclusive allocation" mode,so that some CPU/Memory sensitive PEs can have their own processes without effect on the others, also it may provide an easy way for finding root cause PE when met issues. But we also want to know if it is practicable to provide a "non-exclusive allocation" mode. The implement seems will turn on to the ratio between partition numbers at last.
        Hide
        Aimee Cheng added a comment - - edited

        The patch is based on S4-95.

        Show
        Aimee Cheng added a comment - - edited The patch is based on S4-95 .
        Hide
        Aimee Cheng added a comment -

        Implement of runtime isolation.

        Show
        Aimee Cheng added a comment - Implement of runtime isolation.
        Hide
        Aimee Cheng added a comment - - edited

        The patch I uploaded contains a feature for making PE to be exclusive. By setting exclusive number N for a PE, when deploying, N partition will be allocated to this PE. And the other non-exclusive PEs will be deployed to the left partitions as before, keep symmetrical design.

        To distinguish the PE partition and the Cluster partition, I use global partition to call the partition id in a cluster. And the mainly change is in App, Stream and Sender.

        1. In App, add a schedule method for allocating global partition to PEs

        2. In Sender, change the send(String hashkey, Event event) to be send(int partition, Event event). Because now sender cannot decide sending to which global partition only by hashkey, PE type also need be known. I move this part of logic into Stream. Let PE decides which global partition need be sent.

        3.In Stream, add a logic for iterating PEs and checking if this event need be sent to local.

        There are two test cases for this feature, one for testing the schedule, and another reuses the WordCount test case to check if the logical is right.

        Show
        Aimee Cheng added a comment - - edited The patch I uploaded contains a feature for making PE to be exclusive. By setting exclusive number N for a PE, when deploying, N partition will be allocated to this PE. And the other non-exclusive PEs will be deployed to the left partitions as before, keep symmetrical design. To distinguish the PE partition and the Cluster partition, I use global partition to call the partition id in a cluster. And the mainly change is in App, Stream and Sender. 1. In App, add a schedule method for allocating global partition to PEs 2. In Sender, change the send(String hashkey, Event event) to be send(int partition, Event event). Because now sender cannot decide sending to which global partition only by hashkey, PE type also need be known. I move this part of logic into Stream. Let PE decides which global partition need be sent. 3.In Stream, add a logic for iterating PEs and checking if this event need be sent to local. There are two test cases for this feature, one for testing the schedule, and another reuses the WordCount test case to check if the logical is right.
        Hide
        Matthieu Morel added a comment -

        Aimee can you upload the patch to the review board? thanks! (as a simple diff file)

        Show
        Matthieu Morel added a comment - Aimee can you upload the patch to the review board? thanks! (as a simple diff file)
        Hide
        Aimee Cheng added a comment -

        Ok. Done!

        Show
        Aimee Cheng added a comment - Ok. Done!
        Hide
        Daniel Gómez Ferro added a comment -

        Aimee thanks for the patch! I integrated it in a new branch, S4-91 commit 16d50da32c6f4f093eec9985e7f70486cbdce56c

        I modified it a bit due to recent changes on branch dev

        It seems that currently the test doesn't check that the PEs are indeed running exclusively on some nodes. It would be better to check that explicitly.

        Also, I wonder what happens when PEs run exclusively on some nodes and the cluster reads from a remote stream. Since the remote emitter does a round robin, wouldn't the exclusive partitions also receive these events?

        Show
        Daniel Gómez Ferro added a comment - Aimee thanks for the patch! I integrated it in a new branch, S4-91 commit 16d50da32c6f4f093eec9985e7f70486cbdce56c I modified it a bit due to recent changes on branch dev It seems that currently the test doesn't check that the PEs are indeed running exclusively on some nodes. It would be better to check that explicitly. Also, I wonder what happens when PEs run exclusively on some nodes and the cluster reads from a remote stream. Since the remote emitter does a round robin, wouldn't the exclusive partitions also receive these events?
        Hide
        Aimee Cheng added a comment -

        Hi Daniel, thanks for your point out! I'll think more about how to check the data and update the test cases. And about the remote stream, you are right, the remote stream part is missed in the current patch and it will have problem if receive remote stream. Need change RemoteStream to use the informations of PEs to decide which nodes need be sent.

        I plan to remove the round robin from remote emitter to RemoteStream. when do round robin, the remote events need be sent to exclusive nodes and all the other nodes by using the same way of current code. What do you think of that?

        By the way, I am updating the code to fix the problem you mentioned, and will upload in this weekend I think.

        Show
        Aimee Cheng added a comment - Hi Daniel, thanks for your point out! I'll think more about how to check the data and update the test cases. And about the remote stream, you are right, the remote stream part is missed in the current patch and it will have problem if receive remote stream. Need change RemoteStream to use the informations of PEs to decide which nodes need be sent. I plan to remove the round robin from remote emitter to RemoteStream. when do round robin, the remote events need be sent to exclusive nodes and all the other nodes by using the same way of current code. What do you think of that? By the way, I am updating the code to fix the problem you mentioned, and will upload in this weekend I think.
        Hide
        Daniel Gómez Ferro added a comment -

        For checking that the PEs are running exclusively on some nodes you could write some information to ZK when the PEs process the events, for example the partitionId that processed it. Then, at the end of the test, check that only the expected number of partitions ran each PE.

        As for the remote emitters, the problem is that you don't know if the PEs are exclusive or how they are partitioned, you'll need to store this information in ZK probably.

        Having an update this weekend would be great!

        Show
        Daniel Gómez Ferro added a comment - For checking that the PEs are running exclusively on some nodes you could write some information to ZK when the PEs process the events, for example the partitionId that processed it. Then, at the end of the test, check that only the expected number of partitions ran each PE. As for the remote emitters, the problem is that you don't know if the PEs are exclusive or how they are partitioned, you'll need to store this information in ZK probably. Having an update this weekend would be great!
        Hide
        Matthieu Morel added a comment -

        We don't have a working patch yet so this is rescheduled for 0.7, focusing on cluster and partitioning management

        Show
        Matthieu Morel added a comment - We don't have a working patch yet so this is rescheduled for 0.7, focusing on cluster and partitioning management
        Hide
        Aimee Cheng added a comment -

        Seems late... But I hope maybe we can also try to merge it into this release.

        This patch can support remote stream now, and there have 2 test cases which check both local stream and remote stream. And All the other tests also pass.

        One problem is that I didn't find a proper way to inject zkClient into App, I tried some way but always throw exception. So now I use a simply way. It can be changed.

        Show
        Aimee Cheng added a comment - Seems late... But I hope maybe we can also try to merge it into this release. This patch can support remote stream now, and there have 2 test cases which check both local stream and remote stream. And All the other tests also pass. One problem is that I didn't find a proper way to inject zkClient into App, I tried some way but always throw exception. So now I use a simply way. It can be changed.
        Hide
        Daniel Gómez Ferro added a comment -

        Thanks for the patch Aimee, I integrated it in branch S4-91 and added some small cleanups.

        The added functionality works well, I'm in favor of merging it. It would be great if someone else could review it as well!

        Show
        Daniel Gómez Ferro added a comment - Thanks for the patch Aimee, I integrated it in branch S4-91 and added some small cleanups. The added functionality works well, I'm in favor of merging it. It would be great if someone else could review it as well!
        Hide
        Matthieu Morel added a comment -

        Thanks for the review and updates Daniel!

        The abstraction for partition is also quite nice and could actually be passed to the sender directly. When working with the Helix integration with Kishore, we had come with a similar "Destination" abstraction.

        Nevertheless, there are still some things to update:

        • App.createZKNodeForPartition method will be called for each PE prototype on each node. I'm not sure this scales well, and it should be written only once.
        • the isolation feature impacts the critical path for sending events even when not activated/available. E.g. in RemoteSender there is code that could be skipped if there is no isolation. (that quite easy to fix though)

        It would be really nice to have this patch in the 0.6 release unfortunately I'm not sure how long fixing the above issues will take. Therefore I'll prepare a release of 0.6 first, then we should fix this patch, and possibly add it to a minor release, or to 0.7 (and that might help thinking with the Helix integration).

        Show
        Matthieu Morel added a comment - Thanks for the review and updates Daniel! The abstraction for partition is also quite nice and could actually be passed to the sender directly. When working with the Helix integration with Kishore, we had come with a similar "Destination" abstraction. Nevertheless, there are still some things to update: App.createZKNodeForPartition method will be called for each PE prototype on each node. I'm not sure this scales well, and it should be written only once. the isolation feature impacts the critical path for sending events even when not activated/available. E.g. in RemoteSender there is code that could be skipped if there is no isolation. (that quite easy to fix though) It would be really nice to have this patch in the 0.6 release unfortunately I'm not sure how long fixing the above issues will take. Therefore I'll prepare a release of 0.6 first, then we should fix this patch, and possibly add it to a minor release, or to 0.7 (and that might help thinking with the Helix integration).

          People

          • Assignee:
            Aimee Cheng
            Reporter:
            Aimee Cheng
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development