Airflow queued tasks sometime stuck for long time without being picked up. In many cases the root cause is hard to debug.
The proposed solution here is to add requeue logic in airflow scheduler and executor. For tasks with a queued state in metaDB. If the task did not show in executor.queued_tasks in a certain inteval, we requeue the task so that stuck tasks can be released and picked up. The solution was used in Airbnb and proven to be helpful.