Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.1.0
-
None
-
None
Description
Setry is a well known among Python developers system to capture, classify, track and explain tracebacks, helping people better understand what went wrong, how to reproduce the issue and fix it.
Any Spark application on Python is actually divided on two parts:
1. The one that runs on "driver side". That part user may control in all the ways it want and provide reports to Sentry is very easy to do here.
2. The one that runs on executors. That's Python UDFs and the rest transformation functions. Unfortunately, here we cannot provide such kind of feature. And that is the part this feature is about.
In order to simplify developing experience, it would be nice to have optional Sentry support on PySpark worker level.
What this feature could looks like?
1. PySpark will have new extra named sentry which installs Sentry client and the rest required things if are necessary. This is an optional install-time dependency.
2. PySpark worker will be able to detect presence of Sentry support and send error reports there.
3. All configuration of Sentry could and will be done via standard Sentry`s environment variables.
What this feature will give to users?
1. Better exceptions in Sentry. From driver-side application, now all of them get recorded as like `Py4JJavaError` where the real executor exception is written in a traceback body.
2. Greater simplification of understanding context when thing went wrong and why.
3. Simplify Python UDFs debug and issues reproduce.
Attachments
Issue Links
- duplicates
-
SPARK-22959 Configuration to select the modules for daemon and worker in PySpark
- Resolved
- links to