Uploaded image for project: 'Apache Airflow'
  1. Apache Airflow
  2. AIRFLOW-6569

Broken sentry integration

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.0.0, 1.10.7
    • Fix Version/s: None
    • Component/s: configuration, hooks
    • Labels:
      None

      Description

      I believe the new forking mechanism AIRFLOW-5931 has unintentionally broken the sentry integration.

      Sentry relies on the atexit http://man7.org/linux/man-pages/man3/atexit.3.html signal to flush collected errors to their servers. Previously as the task was executed in a new process as opposed to forked this got invoked. However now os._exit() is called (which is semantically correct with child processes) https://docs.python.org/3/library/os.html#os._exit

      Point os._exit is called in airflow:
      https://github.com/apache/airflow/pull/6627/files#diff-736081a3535ff0b9e60ada2f51154ca4R84

      Also related on sentry bug tracker: https://github.com/getsentry/sentry-python/issues/291

      Unfortunately sentry doesn't provide (from what i can find) a public interface for flushing errors to their system. The return value of their init() functions returns an object containg a client but the property is `_client` so it would be wrong to rely on it.

      I've side stepped this in two ways, you can disable the forking feature through patching CAN_FORK to False. But after seeing the performance improvement on my workers I opted to monkey patch the whole _exec_by_fork() and naughtily call sys.exit instead as a temporary fix.

      I personally dont find the actual sentry integration in airflow useful as it doesn't collect errors from the rest of the system only tasks. I've been wiring it in through my log config module since before the integration was added however its still effected by the above change.

      My personal vote (unless anyone has a better idea) would be to remove the integration completely document the way of setting it up through the logging class and providing a 'post_execute' hook of some form on the StandardTaskRunner where people can flush errors using what not.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                robinedwards Robin Edwards
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: