there is currently a warning when PYARROW_IGNORE_TIMEZONE env var is not set (https://github.com/apache/spark/blob/187c4a9c66758e973633c5c309b551b1d9094e6e/python/pyspark/pandas/__init__.py#L44-L59):
The logging.warning() call will silently do a logging.basicConfig() call (at least in python 3.9, which I tried).
(FYI: Something like logging.getLogger(...).warning() would not do this silent call)
This has the following very hard to figure out side-effect:
importing `pyspark.pandas` (directly or indirectly somewhere) might break your logging setup (if PYARROW_IGNORE_TIMEZONE is not set).
Very basic example (assuming PYARROW_IGNORE_TIMEZONE is not set):
Will only produce the warning, not the debug line.
By removing the import pyspark.pandas, the debug line is produced