Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-49825

default value of `spark.sql.warehouse.dir` is not decoded correctly when saving table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.5.3
    • None
    • PySpark
    • None
    • macOS 15.0

      Java 8 Update 421

    Description

      I haven't looked into how general this problem is, but here's a very specific scenario which I ran into last night.

       

      When the `SparkSession` is created without specifying the config `spark.sql.warehouse.sql`, the default value is cwd/spark-warehouse and this path seems URL-encoded when printed via `spark.conf.get('spark.sql.warehouse.dir')`.

      So, for instance, if any spaces were present in the path, they will be replaced by "%20".

      If this is the case, then the path should be decoded whenever necessary, but it turns out this encoded path is taken literally and consequently spark writes tables to a different location than intended.

       

      here's a minimal snippet to reproduce:

      ```py

      from pyspark.sql import SparkSession

      spark = SparkSession.builder.getOrCreate()
       
      spark.conf.get('spark.sql.warehouse.dir') # 'file:/Users/user/cwd%20with%20space/spark-warehouse'
       
      df = ...
      df.write.saveAsTable('df') # table will be saved at /Users/user/cwd%20with%20space/spark-warehouse
      ```

       

      Interestingly, this doesn't happen if the path is manually specified when creating the session. Even if the path is literally the same as what spark would've taken by-default.

       

      ```py

      from pyspark.sql import SparkSession

      spark = SparkSession.builder.config('spark.sql.warehouse.dir', 'spark-warehouse/').getOrCreate()

       
      spark.conf.get('spark.sql.warehouse.dir') # 'file:/Users/user/cwd with space/spark-warehouse'
      ```

       

      The above works fine.

       

      PS. plz forgive me if this is supposed to happen by design

      Attachments

        Activity

          People

            Unassigned Unassigned
            pantheraleo-7 Asad Shaikh
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: