Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: REEF
    • Labels:
      None

      Description

      We need to run REEF Tasks on Spark Executors. Ideally, that should require only a few lines of changes in the REEF application configuration. All Spark-related logic must be encapsulated in the reef-runtime-spark module, similar to the existing e.g. reef-runtime-yarn or reef-runtime-local. As a first step, we can have a Java-only solution, but later we'll need to run .NET Tasks on Executors as well.

      P.S. Here's a REEF Wiki page with more details: Spark+REEF integration

      1. file.jpeg
        1.01 MB
        Saikat Kanjilal
      2. file-1.jpeg
        947 kB
        Saikat Kanjilal
      3. REEF_1791_Workflow.jpg
        173 kB
        Saikat Kanjilal

        Activity

        Hide
        kanjilal Saikat Kanjilal added a comment -

        Sergiy Matusevych The work on this has begun, here's where the code lives: https://github.com/skanjila/reef/tree/reef-1791

        I have done the following:
        1) Created a new maven project called reef-runtime-spark
        2) Imported for now spark-core and spark-sql into the pom file
        3) Took all of the reef-runtime-mesos code and brought that over or now, renamed all the mesos classes to spark
        4) Added some stubbed out empty unit tests and called them Sparkxxx where xxx stands for client , driver etc

        I will now take a deep dive into the design and add that design to this JIRA. Let me know your thoughts to this approach.

        Show
        kanjilal Saikat Kanjilal added a comment - Sergiy Matusevych The work on this has begun, here's where the code lives: https://github.com/skanjila/reef/tree/reef-1791 I have done the following: 1) Created a new maven project called reef-runtime-spark 2) Imported for now spark-core and spark-sql into the pom file 3) Took all of the reef-runtime-mesos code and brought that over or now, renamed all the mesos classes to spark 4) Added some stubbed out empty unit tests and called them Sparkxxx where xxx stands for client , driver etc I will now take a deep dive into the design and add that design to this JIRA. Let me know your thoughts to this approach.
        Hide
        motus Sergiy Matusevych added a comment -

        Hi Saikat,

        Thanks a lot for your work! That's a great start, and I will look at your code more thoroughly later this week.

        I have only one question that I would like to discuss with the REEF devs: do we really want to bring Spark dependency into REEF, or should we create an independent project for the Spark integration instead? Having reef-runtime-spark in the reef project tree would be the easiest way to go; besides, we already have e.g. Mesos runtime there. OTOH, the fewer dependencies we have, the better. What do you guys think? Markus Weimer?

        Show
        motus Sergiy Matusevych added a comment - Hi Saikat, Thanks a lot for your work! That's a great start, and I will look at your code more thoroughly later this week. I have only one question that I would like to discuss with the REEF devs: do we really want to bring Spark dependency into REEF, or should we create an independent project for the Spark integration instead? Having reef-runtime-spark in the reef project tree would be the easiest way to go; besides, we already have e.g. Mesos runtime there. OTOH, the fewer dependencies we have, the better. What do you guys think? Markus Weimer ?
        Hide
        markus.weimer Markus Weimer added a comment -

        I'm OK with adding the dependency, as long as it stays contained to the new project.

        Show
        markus.weimer Markus Weimer added a comment - I'm OK with adding the dependency, as long as it stays contained to the new project.
        Hide
        motus Sergiy Matusevych added a comment -

        Yes, the dependency is (and should be) restricted to reef-runtime-spark maven project.

        Having it under REEF also means that we can continue using REEF JIRA for issue tracking and REEF GitHub for pull requests - which simplifies things a lot.

        Then I can probably move my Spark+REEF example under, say, reef-examples-spark module. This will give us a reference for Spark-related issue tracking, as well as a place to have Spark-related unit tests.

        Show
        motus Sergiy Matusevych added a comment - Yes, the dependency is (and should be) restricted to reef-runtime-spark maven project. Having it under REEF also means that we can continue using REEF JIRA for issue tracking and REEF GitHub for pull requests - which simplifies things a lot. Then I can probably move my Spark+REEF example under, say, reef-examples-spark module. This will give us a reference for Spark-related issue tracking, as well as a place to have Spark-related unit tests.
        Hide
        kanjilal Saikat Kanjilal added a comment -

        Sergiy MatusevychMarkus Weimer I will move ahead with my approach based on this discussion. I will
        1) read through the code I moved over in detail to get a deep understanding of the workflow
        2) Create a design doc on the approach for spark
        3) what else, get coding ))) and add unit tests

        On number 3 I noted that a few of the other reef-runtimes don't have unit tests, may be something to think about to add to in the future

        More to come as I make more progress, stay tuned.

        Show
        kanjilal Saikat Kanjilal added a comment - Sergiy Matusevych Markus Weimer I will move ahead with my approach based on this discussion. I will 1) read through the code I moved over in detail to get a deep understanding of the workflow 2) Create a design doc on the approach for spark 3) what else, get coding ))) and add unit tests On number 3 I noted that a few of the other reef-runtimes don't have unit tests, may be something to think about to add to in the future More to come as I make more progress, stay tuned.
        Hide
        kanjilal Saikat Kanjilal added a comment - - edited

        Sergiy MatusevychMarkus Weimer

        First cut of the design, several options:

        I spent some time researching the design for this runtime and there are a couple of ways to tackle this problem, both of these options assume that spark executors are already available and that we can invoke one of these and launch our reef task on it:

        Option 1:
        Spark uses an internal rest api server called livy to launch and monitor spark jobs, we can create a reef-rest-client to package up parameters and internally use livy to make a rest API call into the spark cluster to execute the reef task, there are some things to work out here, namely what is the functionality of the driver/evaluator, I had some ideas on this, I was thinking that the driver can launch and manage the evaluator and the evaluator in turn can use livy to make the rest API calls into and monitor the spark job, there's an issue with this in that this was not really the goal of the evaluators , I'm open to expanding their responsibility but we'd need to discuss details a bit further

        Option 2:
        We make a low level networking call using the driver and the evaluator and figure out how to leverage spark-submit on the spark head node to invoke the reef task, this would essentially require logging into the spark head node , understanding where spark-submit is located and invoking it and passing it parameters relevant to the reef task, for example for a custom reef ml algorithm it would involve executing the code for the algorithm through the use of spark executors (very similar to hot deploying a chunk of scala or python code), can you guys think of some other types of reef jobs that would leverage this?

        At the end of the day the spark head node is responsible for executing a job on the spark cluster by farming out parts of the job to the various save nodes so the reef task would basically live inside each of the worker nodes, the spark master node then would combine the result and potentially either send the result back to the reef driver/evaluator.

        Some more things to think about:
        1) What type of reef jobs would be advantageous to run on spark-executors?
        2) Spark has its own monitoring through livy, should reef leverage this or come up with its own monitoring to track progress on its jobs
        3) Should a reef job be executed in scala/python or should it live at a higher level (namely I was thinking maybe reef should indicate the what of the job as opposed to the how as Spark is specifically solving the how)
        4) here's some docs on livy on hdinsight:
        https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-spark-livy-rest-interface

        Let me know your thoughts, would love in person discussions as well if needed.

        Show
        kanjilal Saikat Kanjilal added a comment - - edited Sergiy Matusevych Markus Weimer First cut of the design, several options: I spent some time researching the design for this runtime and there are a couple of ways to tackle this problem, both of these options assume that spark executors are already available and that we can invoke one of these and launch our reef task on it: Option 1: Spark uses an internal rest api server called livy to launch and monitor spark jobs, we can create a reef-rest-client to package up parameters and internally use livy to make a rest API call into the spark cluster to execute the reef task, there are some things to work out here, namely what is the functionality of the driver/evaluator, I had some ideas on this, I was thinking that the driver can launch and manage the evaluator and the evaluator in turn can use livy to make the rest API calls into and monitor the spark job, there's an issue with this in that this was not really the goal of the evaluators , I'm open to expanding their responsibility but we'd need to discuss details a bit further Option 2: We make a low level networking call using the driver and the evaluator and figure out how to leverage spark-submit on the spark head node to invoke the reef task, this would essentially require logging into the spark head node , understanding where spark-submit is located and invoking it and passing it parameters relevant to the reef task, for example for a custom reef ml algorithm it would involve executing the code for the algorithm through the use of spark executors (very similar to hot deploying a chunk of scala or python code), can you guys think of some other types of reef jobs that would leverage this? At the end of the day the spark head node is responsible for executing a job on the spark cluster by farming out parts of the job to the various save nodes so the reef task would basically live inside each of the worker nodes, the spark master node then would combine the result and potentially either send the result back to the reef driver/evaluator. Some more things to think about: 1) What type of reef jobs would be advantageous to run on spark-executors? 2) Spark has its own monitoring through livy, should reef leverage this or come up with its own monitoring to track progress on its jobs 3) Should a reef job be executed in scala/python or should it live at a higher level (namely I was thinking maybe reef should indicate the what of the job as opposed to the how as Spark is specifically solving the how) 4) here's some docs on livy on hdinsight: https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-spark-livy-rest-interface Let me know your thoughts, would love in person discussions as well if needed.
        Hide
        minterlandi Matteo Interlandi added a comment -

        Hi Saikat,
        could you please explain why we need so deep integration into Spark to get REEF runtime working on Spark? In theory if we can add a dependency to Spark, one can simply run a mapPartion job over a properly created partitions holding up resources, where each task spawn an evaluator. This design requires no deep integration into Spark runtime, and is what other libraries like TensorFlowOnSpark adopts.

        Show
        minterlandi Matteo Interlandi added a comment - Hi Saikat, could you please explain why we need so deep integration into Spark to get REEF runtime working on Spark? In theory if we can add a dependency to Spark, one can simply run a mapPartion job over a properly created partitions holding up resources, where each task spawn an evaluator. This design requires no deep integration into Spark runtime, and is what other libraries like TensorFlowOnSpark adopts.
        Hide
        kanjilal Saikat Kanjilal added a comment - - edited

        Matteo Interlandi , thanks for the feedback if you look at the livy interface it is the most lightweight option as it requires no bindings to the spark runtime, the second option I identified is similar to what you are describing although I'm not sure what you mean by mapPartition because Option 2 abstracts this away from us, if we can make the assumption that a set of spark executors are readily available to us then we can just pass the reef task into the set of executors at which point the master node will manage partitioning the data and doing the work necessary for the algorithm. I will look at TensorFlowOnSpark but I suspect its architecture may be a different than ours although the goal is conceivably the same.

        Show
        kanjilal Saikat Kanjilal added a comment - - edited Matteo Interlandi , thanks for the feedback if you look at the livy interface it is the most lightweight option as it requires no bindings to the spark runtime, the second option I identified is similar to what you are describing although I'm not sure what you mean by mapPartition because Option 2 abstracts this away from us, if we can make the assumption that a set of spark executors are readily available to us then we can just pass the reef task into the set of executors at which point the master node will manage partitioning the data and doing the work necessary for the algorithm. I will look at TensorFlowOnSpark but I suspect its architecture may be a different than ours although the goal is conceivably the same.
        Hide
        markus.weimer Markus Weimer added a comment -

        I think it might help to enumerate what a "REEF runtime" actually is and then discuss which parts of it we want on Spark. A REEF runtime consists of two distinct, potentially even separable pieces:

        REEF Client: An implementation of the interfaces necessary to submit a REEF Driver to a resource manager for execution. In the case of YARN, this would mean the submission of an Application Master, for example.

        REEF Driver: On the Driver side, a runtime consists of the implementations of all the interfaces necessary to process Evaluator requests, generate AllocatedEvaluator events and launch the actual Evaluators. In the YARN example, much of this boils down to 1:1 mappings between the REEF an YARN concepts. Another example: In the case of a local runtime, this part is a bit more involved as it has to actually spawn the processes.

        The Spark Runtime launches a REEF Job from an existing Spark job. Hence, we don't need a client as much as Sergiy Matusevych's work on running the REEF Driver in the same JVM as the Spark Driver. There is no "submission" of a job to a "cluster". Hence, this is more or less already solved.

        Now, for the Driver APIs, I think we can indeed rely on Spark constructs as Matteo Interlandi suggested. The typical case would be to ask for one Evaluator (represented as an ActiveContext) per Spark Executor or RDD partition. Different from all other REEF runtimes, those Evaluators would "just show up" without being asked for.

        Which leaves the question of how to ask for additional Evaluators on Spark. For that, I see two options: (1) Ask YARN or (2) ask Spark. Not sure how to do the second one of these, though.

        Show
        markus.weimer Markus Weimer added a comment - I think it might help to enumerate what a "REEF runtime" actually is and then discuss which parts of it we want on Spark. A REEF runtime consists of two distinct, potentially even separable pieces: REEF Client: An implementation of the interfaces necessary to submit a REEF Driver to a resource manager for execution. In the case of YARN, this would mean the submission of an Application Master, for example. REEF Driver: On the Driver side, a runtime consists of the implementations of all the interfaces necessary to process Evaluator requests, generate AllocatedEvaluator events and launch the actual Evaluators. In the YARN example, much of this boils down to 1:1 mappings between the REEF an YARN concepts. Another example: In the case of a local runtime, this part is a bit more involved as it has to actually spawn the processes. The Spark Runtime launches a REEF Job from an existing Spark job. Hence, we don't need a client as much as Sergiy Matusevych 's work on running the REEF Driver in the same JVM as the Spark Driver. There is no "submission" of a job to a "cluster". Hence, this is more or less already solved. Now, for the Driver APIs, I think we can indeed rely on Spark constructs as Matteo Interlandi suggested. The typical case would be to ask for one Evaluator (represented as an ActiveContext ) per Spark Executor or RDD partition. Different from all other REEF runtimes, those Evaluators would "just show up" without being asked for. Which leaves the question of how to ask for additional Evaluators on Spark. For that, I see two options: (1) Ask YARN or (2) ask Spark. Not sure how to do the second one of these, though.
        Hide
        kanjilal Saikat Kanjilal added a comment -

        These are screen shots of design discussions with me and Sergiy Matusevych, we will have one this week or early next week.

        Show
        kanjilal Saikat Kanjilal added a comment - These are screen shots of design discussions with me and Sergiy Matusevych , we will have one this week or early next week.
        Hide
        kanjilal Saikat Kanjilal added a comment -

        Some additional thoughts around the design:
        1) We will try to piggyback off the Spark+Reef integration already prototyped by Sergiy Matusevych, code is here: https://github.com/apache/reef/blob/master/lang/scala/reef-examples-scala/src/main/scala/org/apache/reef/examples/hellospark/ReefOnSpark.scala
        2) The Reef evaluators will be hosted by the Spark executors and will implement the heartbeat protocol, the Reef Driver will communicate with these evaluators
        3) Next steps will be to figure out minimal viable Spark runtime , I will take a look at the DataLoader function to kickstart this: https://github.com/apache/reef/blob/master/lang/java/reef-examples/src/main/java/org/apache/reef/examples/data/loading/DataLoadingREEF.java#L102

        Will update JIRA with initial pull request as design discussions progress further

        Show
        kanjilal Saikat Kanjilal added a comment - Some additional thoughts around the design: 1) We will try to piggyback off the Spark+Reef integration already prototyped by Sergiy Matusevych , code is here: https://github.com/apache/reef/blob/master/lang/scala/reef-examples-scala/src/main/scala/org/apache/reef/examples/hellospark/ReefOnSpark.scala 2) The Reef evaluators will be hosted by the Spark executors and will implement the heartbeat protocol, the Reef Driver will communicate with these evaluators 3) Next steps will be to figure out minimal viable Spark runtime , I will take a look at the DataLoader function to kickstart this: https://github.com/apache/reef/blob/master/lang/java/reef-examples/src/main/java/org/apache/reef/examples/data/loading/DataLoadingREEF.java#L102 Will update JIRA with initial pull request as design discussions progress further
        Hide
        kanjilal Saikat Kanjilal added a comment - - edited

        Markus WeimerSergiy Matusevych a few observations on deeper inspection:

        1) As I perused through the example code we've written for reef on spark it seems that it uses the scala ARM library which does automatic resource management, please correct me if I'm wrong but I think the reef-runtime-spark should use the resource management enabled by yarn and not this library, am I correct here or the two pieces are different
        2) The example simply prints out hello world as a reef task, I'd actually like to invoke one of the spark ML API's from reef as a success criteria for this (case in point linear regression or logistic regression), this would of course involve some research into the exact events that reef needs to listen for
        3) It seems that the example works inside a look where it invokes methods on the client and then calls reef.run in an inner loop, I don't think this is how the reef-runtime-spark should run, in fact I propose a cleaner approach where the reef-runtime-spark gets a handle to the client based on the driver configuration and then invokes reef.run on this client

        Thoughts?

        Show
        kanjilal Saikat Kanjilal added a comment - - edited Markus Weimer Sergiy Matusevych a few observations on deeper inspection: 1) As I perused through the example code we've written for reef on spark it seems that it uses the scala ARM library which does automatic resource management, please correct me if I'm wrong but I think the reef-runtime-spark should use the resource management enabled by yarn and not this library, am I correct here or the two pieces are different 2) The example simply prints out hello world as a reef task, I'd actually like to invoke one of the spark ML API's from reef as a success criteria for this (case in point linear regression or logistic regression), this would of course involve some research into the exact events that reef needs to listen for 3) It seems that the example works inside a look where it invokes methods on the client and then calls reef.run in an inner loop, I don't think this is how the reef-runtime-spark should run, in fact I propose a cleaner approach where the reef-runtime-spark gets a handle to the client based on the driver configuration and then invokes reef.run on this client Thoughts?
        Hide
        markus.weimer Markus Weimer added a comment -

        1) As I perused through the example code we've written for reef on spark it seems that it uses the scala ARM library which does automatic resource management, please correct me if I'm wrong but I think the reef-runtime-spark should use the resource management enabled by yarn and not this library, am I correct here or the two pieces are different

        No idea. Maybe Matteo Interlandi can help?

        2) The example simply prints out hello world as a reef task, I'd actually like to invoke one of the spark ML API's from reef as a success criteria for this (case in point linear regression or logistic regression), this would of course involve some research into the exact events that reef needs to listen for

        No, the goal here is to use REEF to run ML algos built on REEF on top of data (and resources) managed by Spark. Hence, running REEF tasks is a more relevant success criterion than calling Spark features.

        3) It seems that the example works inside a look where it invokes methods on the client and then calls reef.run in an inner loop, I don't think this is how the reef-runtime-spark should run, in fact I propose a cleaner approach where the reef-runtime-spark gets a handle to the client based on the driver configuration and then invokes reef.run on this client

        I am not sure I follow. Can you elaborate?

        Show
        markus.weimer Markus Weimer added a comment - 1) As I perused through the example code we've written for reef on spark it seems that it uses the scala ARM library which does automatic resource management, please correct me if I'm wrong but I think the reef-runtime-spark should use the resource management enabled by yarn and not this library, am I correct here or the two pieces are different No idea. Maybe Matteo Interlandi can help? 2) The example simply prints out hello world as a reef task, I'd actually like to invoke one of the spark ML API's from reef as a success criteria for this (case in point linear regression or logistic regression), this would of course involve some research into the exact events that reef needs to listen for No, the goal here is to use REEF to run ML algos built on REEF on top of data (and resources) managed by Spark. Hence, running REEF tasks is a more relevant success criterion than calling Spark features. 3) It seems that the example works inside a look where it invokes methods on the client and then calls reef.run in an inner loop, I don't think this is how the reef-runtime-spark should run, in fact I propose a cleaner approach where the reef-runtime-spark gets a handle to the client based on the driver configuration and then invokes reef.run on this client I am not sure I follow. Can you elaborate?
        Hide
        kanjilal Saikat Kanjilal added a comment -

        Markus Weimer take a look at this code, this has the two loops that I was talking about:
        https://github.com/apache/reef/blob/master/lang/scala/reef-examples-scala/src/main/scala/org/apache/reef/examples/hellospark/ReefOnSpark.scala

        the reef-runtime-spark in my mind should: 1) setup a driver configuration 2) instantiate a client based on this 3) invoke reef task on same jvm as spark

        Show
        kanjilal Saikat Kanjilal added a comment - Markus Weimer take a look at this code, this has the two loops that I was talking about: https://github.com/apache/reef/blob/master/lang/scala/reef-examples-scala/src/main/scala/org/apache/reef/examples/hellospark/ReefOnSpark.scala the reef-runtime-spark in my mind should: 1) setup a driver configuration 2) instantiate a client based on this 3) invoke reef task on same jvm as spark
        Hide
        markus.weimer Markus Weimer added a comment -

        Those aren't loops That is Scala's idiom for try-with-resources and they make sure the resources are properly {{.close()}}ed.

        Show
        markus.weimer Markus Weimer added a comment - Those aren't loops That is Scala's idiom for try-with-resources and they make sure the resources are properly {{.close()}}ed.
        Hide
        kanjilal Saikat Kanjilal added a comment -
        Show
        kanjilal Saikat Kanjilal added a comment - Pull request: https://github.com/apache/reef/pull/1324
        Hide
        kanjilal Saikat Kanjilal added a comment - - edited

        Sergiy Matusevych can you take a look at this pull request, I have done the following since our last discussion:
        1) Removed a bunch of code that is not needed in this module
        2) Added a custom listener which extends the SparkListener interface and allows us to listen for the spark events coming from spark within the Reef app
        3) I also moved the example code from the src/test location to the src/main location.
        I will now start removing the yarn related configs from the example code

        Pull request updated here: https://github.com/apache/reef/pull/1324/files

        Show
        kanjilal Saikat Kanjilal added a comment - - edited Sergiy Matusevych can you take a look at this pull request, I have done the following since our last discussion: 1) Removed a bunch of code that is not needed in this module 2) Added a custom listener which extends the SparkListener interface and allows us to listen for the spark events coming from spark within the Reef app 3) I also moved the example code from the src/test location to the src/main location. I will now start removing the yarn related configs from the example code Pull request updated here: https://github.com/apache/reef/pull/1324/files
        Hide
        motus Sergiy Matusevych added a comment -

        Here's a REEF Wiki page with more details: Spark+REEF integration

        Show
        motus Sergiy Matusevych added a comment - Here's a REEF Wiki page with more details: Spark+REEF integration
        Hide
        kanjilal Saikat Kanjilal added a comment -

        Sergiy Matusevych My questions from the proposal:
        1) What are the negatives in running in unmanaged AM mode , what happens if the code runs into any performance issues, how will it recover, how will the user manage this, seems like this is placing more responsiblity on the user , he/she may or may not have this knowledge
        2) I would like to see an end to end User Interaction Diagram , maybe when we meet we can discuss this
        3) Is it possible to make the partitions configurable in the DataLoader Service, in general I'd like to understand how this can be specified
        4) What are the tradeoffs between using the EvaluatorRequestor versus DataLoader, if the goal is to not have too much dependency on spark internals it seems like the dataloader is a better approach
        5) I would postpone the low level spark API till the first part using the EvaluatorRequestor or the DataLoader is complete
        6) In the Reef.net Bridge I would recommend launching a .NET vm as a separate process to avoid using JNI and not being able to use spark-submit.

        Let me know your thoughts and we can meet in person next.

        Show
        kanjilal Saikat Kanjilal added a comment - Sergiy Matusevych My questions from the proposal: 1) What are the negatives in running in unmanaged AM mode , what happens if the code runs into any performance issues, how will it recover, how will the user manage this, seems like this is placing more responsiblity on the user , he/she may or may not have this knowledge 2) I would like to see an end to end User Interaction Diagram , maybe when we meet we can discuss this 3) Is it possible to make the partitions configurable in the DataLoader Service, in general I'd like to understand how this can be specified 4) What are the tradeoffs between using the EvaluatorRequestor versus DataLoader, if the goal is to not have too much dependency on spark internals it seems like the dataloader is a better approach 5) I would postpone the low level spark API till the first part using the EvaluatorRequestor or the DataLoader is complete 6) In the Reef.net Bridge I would recommend launching a .NET vm as a separate process to avoid using JNI and not being able to use spark-submit. Let me know your thoughts and we can meet in person next.
        Hide
        motus Sergiy Matusevych added a comment - - edited

        1) What are the negatives in running in unmanaged AM mode , what happens if the code runs into any performance issues, how will it recover, how will the user manage this, seems like this is placing more responsiblity on the user , he/she may or may not have this knowledge

        By design, REEF assumes that the user takes full responsibility of the app. This is done because we want the user to be in control as much as possible while providing sane defaults. Running REEF from Spark is no different - we assume that the user will implement all the necessary event handlers for the failure events if the defaults are not sufficient for the use case. What is different for the Unmanaged AM mode is that the REEF Driver launched from Spark must also respond to the (failure) events originated from Spark, and we currently do not have mechanisms to forward Spark events to the REEF app transparently - the user has to do it by hand. Other than Spark-REEF event forwarding, all other issues that you mention – performance and error recovery, usability, etc. – are not directly relevant to this PR, and we can discuss them elsewhere.

        2) I would like to see an end to end User Interaction Diagram , maybe when we meet we can discuss this

        Let's talk about it in the meeting and post a picture here.

        3) Is it possible to make the partitions configurable in the DataLoader Service, in general I'd like to understand how this can be specified

        I am not sure what parameters you are talking about. Please take a look at the DataLoadingRequestBuilder and let me know what other parameters we might need for Spark integration.

        4) What are the tradeoffs between using the EvaluatorRequestor versus DataLoader, if the goal is to not have too much dependency on spark internals it seems like the dataloader is a better approach

        In my opinion, EvaluatorRequestor is more flexible as it allows us to request additional partitions (and potentially the new datasets) at runtime. OTOH, DataLoader can be easier to implement and it should cover 99% of our needs. In the long run, we may end up with both approaches implemented.

        5) I would postpone the low level spark API till the first part using the EvaluatorRequestor or the DataLoader is complete

        That depends on how hard it is to implement the `EvaluatorRequestor` using low-level Spark API. If done properly, it can give us a proper REEF+Spark runitme that is completely transparent to the end user; then we won't need any workarounds like DataLoader or custom data-driven SparkEvaluatorRequestor. Still, I would much prefer a workaround that would allow us to move forward with Spark+REEF.NET integration now and come back to the low-level solution later.

        6) In the Reef.net Bridge I would recommend launching a .NET vm as a separate process to avoid using JNI and not being able to use spark-submit.

        I agree.

        Show
        motus Sergiy Matusevych added a comment - - edited 1) What are the negatives in running in unmanaged AM mode , what happens if the code runs into any performance issues, how will it recover, how will the user manage this, seems like this is placing more responsiblity on the user , he/she may or may not have this knowledge By design, REEF assumes that the user takes full responsibility of the app. This is done because we want the user to be in control as much as possible while providing sane defaults. Running REEF from Spark is no different - we assume that the user will implement all the necessary event handlers for the failure events if the defaults are not sufficient for the use case. What is different for the Unmanaged AM mode is that the REEF Driver launched from Spark must also respond to the (failure) events originated from Spark, and we currently do not have mechanisms to forward Spark events to the REEF app transparently - the user has to do it by hand. Other than Spark-REEF event forwarding, all other issues that you mention – performance and error recovery, usability, etc. – are not directly relevant to this PR, and we can discuss them elsewhere. 2) I would like to see an end to end User Interaction Diagram , maybe when we meet we can discuss this Let's talk about it in the meeting and post a picture here. 3) Is it possible to make the partitions configurable in the DataLoader Service, in general I'd like to understand how this can be specified I am not sure what parameters you are talking about. Please take a look at the DataLoadingRequestBuilder and let me know what other parameters we might need for Spark integration. 4) What are the tradeoffs between using the EvaluatorRequestor versus DataLoader, if the goal is to not have too much dependency on spark internals it seems like the dataloader is a better approach In my opinion, EvaluatorRequestor is more flexible as it allows us to request additional partitions (and potentially the new datasets) at runtime. OTOH, DataLoader can be easier to implement and it should cover 99% of our needs. In the long run, we may end up with both approaches implemented. 5) I would postpone the low level spark API till the first part using the EvaluatorRequestor or the DataLoader is complete That depends on how hard it is to implement the `EvaluatorRequestor` using low-level Spark API. If done properly, it can give us a proper REEF+Spark runitme that is completely transparent to the end user; then we won't need any workarounds like DataLoader or custom data-driven SparkEvaluatorRequestor. Still, I would much prefer a workaround that would allow us to move forward with Spark+REEF.NET integration now and come back to the low-level solution later. 6) In the Reef.net Bridge I would recommend launching a .NET vm as a separate process to avoid using JNI and not being able to use spark-submit. I agree.
        Hide
        kanjilal Saikat Kanjilal added a comment -

        Sergiy Matusevych I added a SparkEvaluatorRequestor to the pull request, I think we should work on that approach and get that working end to end before implementing the DataLoader piece. We can discuss this when we meet.

        Show
        kanjilal Saikat Kanjilal added a comment - Sergiy Matusevych I added a SparkEvaluatorRequestor to the pull request, I think we should work on that approach and get that working end to end before implementing the DataLoader piece. We can discuss this when we meet.
        Hide
        kanjilal Saikat Kanjilal added a comment -

        Design of initial workflow

        Show
        kanjilal Saikat Kanjilal added a comment - Design of initial workflow

          People

          • Assignee:
            kanjilal Saikat Kanjilal
            Reporter:
            motus Sergiy Matusevych
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Time Tracking

              Estimated:
              Original Estimate - 1,344h
              1,344h
              Remaining:
              Remaining Estimate - 1,344h
              1,344h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development