Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.5.3
-
None
-
None
-
macOS 15.0
Java 8 Update 421
Description
I haven't looked into how general this problem is, but here's a very specific scenario which I ran into last night.
When the `SparkSession` is created without specifying the config `spark.sql.warehouse.sql`, the default value is cwd/spark-warehouse and this path seems URL-encoded when printed via `spark.conf.get('spark.sql.warehouse.dir')`.
So, for instance, if any spaces were present in the path, they will be replaced by "%20".
If this is the case, then the path should be decoded whenever necessary, but it turns out this encoded path is taken literally and consequently spark writes tables to a different location than intended.
here's a minimal snippet to reproduce:
```py
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
spark.conf.get('spark.sql.warehouse.dir') # 'file:/Users/user/cwd%20with%20space/spark-warehouse'
df = ...
df.write.saveAsTable('df') # table will be saved at /Users/user/cwd%20with%20space/spark-warehouse
```
Interestingly, this doesn't happen if the path is manually specified when creating the session. Even if the path is literally the same as what spark would've taken by-default.
```py
from pyspark.sql import SparkSession
spark = SparkSession.builder.config('spark.sql.warehouse.dir', 'spark-warehouse/').getOrCreate()
spark.conf.get('spark.sql.warehouse.dir') # 'file:/Users/user/cwd with space/spark-warehouse'
```
The above works fine.
PS. plz forgive me if this is supposed to happen by design