Details
-
Task
-
Status: Open
-
Major
-
Resolution: Unresolved
Description
About pydolphinscheduler
PyDolphinScheduler is Python API for Apache DolphinScheduler, which allows you to define your workflow by Python code, aka workflow-as-codes. You could see more detail about PyDolphinScheduler in its document[4]. And all the source code hold as the submodule in DolphinScheduler main codebase[5].
The Goal
Make pydolphinscheduler's CLI more powerful, make it can operate the model of DolphinScheduler, run pydolphinscheduler's code, visualize its DAG graph in the terminal.
Detail
Up to now, Apache DolphinScheduler Python API has CLI only with limited command supported and our community wishes it to become a more powerful tool and support as much command as possible(unless command has security issue).
It only supports `version` and `config` for now, which you could see more detail in [1]
Basically, we think the following command is helpful for CLI and you could add another command if it should be added(but may sure after discussing in the community):
- `run <DAG name> [--example]`: Run local workflow DAG file or examples build-in
- `users`: User's operation, CURD
- `projects`: Project's operation, CURD, grant to other users
- `tenants`: Tenant's operation, CURD
- `workflow`: Workflow's operation, CURD, name change, should also change the local Python file name
- `visualize`: Show task graph in the terminal.
- etc...
Besides the functional addition, we should also consider the output part of CLI which makes our output more clear and cool. We may consider using (we should also find other interesting packages to do it):
- rich: For highlight, our output, or using some existing rich plugin like `click-rich`
- tabulate: For the tables visualization in terminal
What Can You Learn
We wish everyone joining GSoC could learn some things from the project. When you finish this project, you could learn:
- How to write production-level Python codes and docs, you could improve your Python syntax, how to write tests with `pytest` and `tox`, how to write a document with `sphnix` and it related plugin, how to format your Python code and the linter inside
- Adding knowledge about task scheduling system, what is it and what it focuses, how it could be run
If You Interested in It
If you want to take this ticket, you should
- (Must) Python skill, especially packages click, pytest and etc.
- Have a little knowledge of task scheduling systems.
- (Optional) Basic Java knowledge is better because Apache DolphinScheduler core is written with Java and you may add some functional code to it.
Mentors
- Calvin Kirs: Committer of Apache {DolphinScheduler, SeaTunnel, Wayang}, DolphinScheduler PMC and SeaTunnel PPMC
- Jiajie Zhong: Committer of Apache {Airflow, DolphinScheduler, SeaTunnel}, SeaTunnel PPMC
[1]: https://dolphinscheduler.apache.org/python/cli.html
[2]: https://github.com/Textualize/rich
[3]: https://github.com/astanin/python-tabulate
[4]: https://dolphinscheduler.apache.org/python/index.html
[5]: https://github.com/apache/dolphinscheduler/tree/dev/dolphinscheduler-python/pydolphinscheduler