Uploaded image for project: 'Apache Airflow'
  1. Apache Airflow
  2. AIRFLOW-4796

DOCO - DaskExecutor logs

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.10.3
    • Fix Version/s: None
    • Component/s: executors, logging
    • Labels:
      None

      Description

      I have an Airflow installation (on Kubernetes). My setup uses DaskExecutor. I also configured remote logging to S3. However when the task is running I cannot see the log, and I get this error instead:

          • Log file does not exist: /airflow/logs/dbt/run_dbt/2018-11-01T06:00:00+00:00/3.log
          • Fetching from: http://airflow-worker-74d75ccd98-6g9h5:8793/log/dbt/run_dbt/2018-11-01T06:00:00+00:00/3.log
          • Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-74d75ccd98-6g9h5', port=8793): Max retries exceeded with url: /log/dbt/run_dbt/2018-11-01T06:00:00+00:00/3.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7d0668ae80>: Failed to establish a new connection: [Errno -2] Name or service not known',))

       

      Once the task is done, the log is shown correctly.

      I believe what Airflow is doing is:

      • for finished tasks read logs from s3
      • for running tasks, connect to executor's log server endpoint and show that.

      Looks like Airflow is using celery.worker_log_server_port to connect to my dask executor to fetch logs from there.

      How to configure DaskExecutor to expose log server endpoint?

      my configuration:

       

       
      core remote_logging True
      core remote_base_log_folder s3://some-s3-path
      core executor DaskExecutor
      dask cluster_address 127.0.0.1:8786
      celery worker_log_server_port 8793 

       

       

      what i verified: - verified that the log file exists and is being written to on the executor while the task is running - called netstat -tunlp on executor container, but did not find any extra port exposed, where logs could be served from.

       

       

       

      We solved the problem by simply starting a python HTTP handler on a worker.

      Dockerfile:

       
      RUN mkdir -p $AIRFLOW_HOME/serve
      RUN ln -s $AIRFLOW_HOME/logs $AIRFLOW_HOME/serve/log

      worker.sh (run by Docker CMD):

       
      #!/usr/bin/env bash

      cd $AIRFLOW_HOME/serve
      python3 -m http.server 8793 &

      cd -
      dask-worker $@

       

       

       

      see https://stackoverflow.com/questions/53121401/airflow-live-executor-logs-with-daskexecutor

       

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              toopt4 t oo
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: