1) What are the negatives in running in unmanaged AM mode , what happens if the code runs into any performance issues, how will it recover, how will the user manage this, seems like this is placing more responsiblity on the user , he/she may or may not have this knowledge
By design, REEF assumes that the user takes full responsibility of the app. This is done because we want the user to be in control as much as possible while providing sane defaults. Running REEF from Spark is no different - we assume that the user will implement all the necessary event handlers for the failure events if the defaults are not sufficient for the use case. What is different for the Unmanaged AM mode is that the REEF Driver launched from Spark must also respond to the (failure) events originated from Spark, and we currently do not have mechanisms to forward Spark events to the REEF app transparently - the user has to do it by hand. Other than Spark-REEF event forwarding, all other issues that you mention – performance and error recovery, usability, etc. – are not directly relevant to this PR, and we can discuss them elsewhere.
2) I would like to see an end to end User Interaction Diagram , maybe when we meet we can discuss this
Let's talk about it in the meeting and post a picture here.
3) Is it possible to make the partitions configurable in the DataLoader Service, in general I'd like to understand how this can be specified
I am not sure what parameters you are talking about. Please take a look at the DataLoadingRequestBuilder and let me know what other parameters we might need for Spark integration.
4) What are the tradeoffs between using the EvaluatorRequestor versus DataLoader, if the goal is to not have too much dependency on spark internals it seems like the dataloader is a better approach
In my opinion, EvaluatorRequestor is more flexible as it allows us to request additional partitions (and potentially the new datasets) at runtime. OTOH, DataLoader can be easier to implement and it should cover 99% of our needs. In the long run, we may end up with both approaches implemented.
5) I would postpone the low level spark API till the first part using the EvaluatorRequestor or the DataLoader is complete
That depends on how hard it is to implement the `EvaluatorRequestor` using low-level Spark API. If done properly, it can give us a proper REEF+Spark runitme that is completely transparent to the end user; then we won't need any workarounds like DataLoader or custom data-driven SparkEvaluatorRequestor. Still, I would much prefer a workaround that would allow us to move forward with Spark+REEF.NET integration now and come back to the low-level solution later.
6) In the Reef.net Bridge I would recommend launching a .NET vm as a separate process to avoid using JNI and not being able to use spark-submit.