Zeppelin is now only able to support corntab. A note is executed periodically at a specified time.
In the actual operating environment, The way through corntab is too simple, Workflow orchestration for paragraphs of different interpreters in multiple notes (or a note) in a specific execution order cannot be supported.
We created a lot of notes in our zeppelin, We urgently need zeppelin to support the layout of the workflow. This can form a closed loop of data processing. Not just an interactive development tool.
Especially in machine learning, Because machine learning generally has a long task execution.
A typical example is as follows:
1) First, obtain data from HDFS through spark;
2) Clean and convert the data through sparksql;
3) Feature extraction of data through spark;
4) Tensorflow writing algorithm through hadoop submarine;
5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch processing;
6) Publish the training acquisition model and provide online prediction services;
7) Model prediction by flink;
8) Receive incremental data through flink for incremental update of the model;
Therefore, zeppelin is especially required to have the ability to arrange workflows.