Details
Description
in
python/pyspark/pandas/__init__.py
there is currently a warning when PYARROW_IGNORE_TIMEZONE env var is not set (https://github.com/apache/spark/blob/187c4a9c66758e973633c5c309b551b1d9094e6e/python/pyspark/pandas/__init__.py#L44-L59):
import logging logging.warning( "'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is required to "...
The logging.warning() call will silently do a logging.basicConfig() call (at least in python 3.9, which I tried).
(FYI: Something like logging.getLogger(...).warning() would not do this silent call)
This has the following very hard to figure out side-effect:
importing `pyspark.pandas` (directly or indirectly somewhere) might break your logging setup (if PYARROW_IGNORE_TIMEZONE is not set).
Very basic example (assuming PYARROW_IGNORE_TIMEZONE is not set):
import logging import pyspark.pandas logging.basicConfig(level=logging.DEBUG) logger = logging.getLogger("test") logger.warning("I warn you") logger.debug("I debug you")
Will only produce the warning, not the debug line.
By removing the import pyspark.pandas, the debug line is produced