Description
Zeppelin is now only able to support corntab. A note is executed periodically at a specified time.
In the actual operating environment, The way through corntab is too simple, Workflow orchestration for paragraphs of different interpreters in multiple notes (or a note) in a specific execution order cannot be supported.
We created a lot of notes in our zeppelin, We urgently need zeppelin to support the layout of the workflow. This can form a closed loop of data processing. Not just an interactive development tool.
Especially in machine learning, Because machine learning generally has a long task execution.
A typical example is as follows:
1) First, obtain data from HDFS through spark;
2) Clean and convert the data through sparksql;
3) Feature extraction of data through spark;
4) Tensorflow writing algorithm through hadoop submarine;
5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch processing;
6) Publish the training acquisition model and provide online prediction services;
7) Model prediction by flink;
8) Receive incremental data through flink for incremental update of the model;
Therefore, zeppelin is especially required to have the ability to arrange workflows.
Please refer to on-going design doc, and add your thoughts:
https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit?usp=sharing
Attachments
Attachments
Issue Links
- Dependency
-
ZEPPELIN-3856 Zeppelin add Hadoop Submarine (machine learning) interpreter
- Open
- is related to
-
ZEPPELIN-4181 Run corntab scheduled task with an isolated environment
- Open
- links to
Hello, everyone, I have completed the workflow system design, please review, you can directly modify the document or fill in the comments.