Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17810

Default spark.sql.warehouse.dir is relative to local FS but can resolve as HDFS path

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.1
    • Fix Version/s: 2.0.2, 2.1.0
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      Following SPARK-15899 and https://github.com/apache/spark/pull/13868#discussion_r82252372 we have a slightly different problem.

      The change removed the file: scheme from the default spark.sql.warehouse.dir as part of its fix, though the path is still clearly intended to be a local FS path and defaults to "spark-warehouse" in the user's home dir. However when running on HDFS this path will be resolved as an HDFS path, where it almost surely doesn't exist.

      Although it can be fixed by overriding spark.sql.warehouse.dir to a path like "file:/tmp/spark-warehouse", or any valid HDFS path, this probably won't work on Windows (the original problem) and of course means the default fails to work for most HDFS use cases.

      There's a related problem here: the docs say the default should be spark-warehouse relative to the current working dir, not the user home dir. We can adjust that.

      PR coming shortly.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                srowen Sean Owen
                Reporter:
                srowen Sean Owen
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: