The currently published sedona python package has an explicit dependency on pyspark.
When used on spark platforms such as Databricks spark comes pre-installed, but not integrated with pip. A `pip install sedona` will thus install another pyspark copy - which in the best case is just superfluous. In the worst case it might cause trouble in combination with the pre-installed spark.
Workarounds, such as installing sedona without dependencies can work for a while.
But this is fragile: as soon as dependency validation as performed e.g. by setuptools entrypoints comes around it will break.
I guess there are two options:
- Removing the pyspark dependency completely, considering it to "obvious"
- Add pyspark as an optional `extras_require` to an extra called "spark".
This would allow a pip install as below, which would get sedona and the corresponding pyspark distribution:
pip install sedona[spark]
I'd be willing to create a corresponding pull-request if one of the options would be accepted.
- links to