Uploaded image for project: 'Apache Airflow'
  1. Apache Airflow
  2. AIRFLOW-4796

DOCO - DaskExecutor logs



    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.10.3
    • Fix Version/s: None
    • Component/s: executors, logging
    • Labels:


      I have an Airflow installation (on Kubernetes). My setup uses DaskExecutor. I also configured remote logging to S3. However when the task is running I cannot see the log, and I get this error instead:

          • Log file does not exist: /airflow/logs/dbt/run_dbt/2018-11-01T06:00:00+00:00/3.log
          • Fetching from: http://airflow-worker-74d75ccd98-6g9h5:8793/log/dbt/run_dbt/2018-11-01T06:00:00+00:00/3.log
          • Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-74d75ccd98-6g9h5', port=8793): Max retries exceeded with url: /log/dbt/run_dbt/2018-11-01T06:00:00+00:00/3.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7d0668ae80>: Failed to establish a new connection: [Errno -2] Name or service not known',))


      Once the task is done, the log is shown correctly.

      I believe what Airflow is doing is:

      • for finished tasks read logs from s3
      • for running tasks, connect to executor's log server endpoint and show that.

      Looks like Airflow is using celery.worker_log_server_port to connect to my dask executor to fetch logs from there.

      How to configure DaskExecutor to expose log server endpoint?

      my configuration:


      core remote_logging True
      core remote_base_log_folder s3://some-s3-path
      core executor DaskExecutor
      dask cluster_address
      celery worker_log_server_port 8793 



      what i verified: - verified that the log file exists and is being written to on the executor while the task is running - called netstat -tunlp on executor container, but did not find any extra port exposed, where logs could be served from.




      We solved the problem by simply starting a python HTTP handler on a worker.


      RUN mkdir -p $AIRFLOW_HOME/serve
      RUN ln -s $AIRFLOW_HOME/logs $AIRFLOW_HOME/serve/log

      worker.sh (run by Docker CMD):

      #!/usr/bin/env bash

      cd $AIRFLOW_HOME/serve
      python3 -m http.server 8793 &

      cd -
      dask-worker $@




      see https://stackoverflow.com/questions/53121401/airflow-live-executor-logs-with-daskexecutor






            • Assignee:
              toopt4 t oo
            • Votes:
              0 Vote for this issue
              2 Start watching this issue


              • Created: