Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36327

Spark sql creates staging dir inside database directory rather than creating inside table directory

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.1.2
    • 3.3.0
    • Spark Core, SQL
    • None
    • Important

    Description

      Spark sql creates staging dir inside database directory rather than creating inside table directory.

       

      This arises only when viewfs:// is configured. When the location is hdfs://, it doesn't occur.

       

      Based on further investigation in file SaveAsHiveFile.scala, I could see that the directory hierarchy has been not properly handled for viewFS condition.
      Parent path(db path) is passed rather than passing the actual directory(table location).

      {{
      // Mostly copied from Context.java#getExternalTmpPath of Hive 1.2
      private def newVersionExternalTempPath(
      path: Path,
      hadoopConf: Configuration,
      stagingDir: String): Path = {
      val extURI: URI = path.toUri
      if (extURI.getScheme == "viewfs")

      { getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir) }

      else

      { new Path(getExternalScratchDir(extURI, hadoopConf, stagingDir), "-ext-10000") }

      }
      }}

      Please refer below lines

      ===============================
      if (extURI.getScheme == "viewfs") {
      getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir)
      ===============================

      Attachments

        Activity

          People

            senthh Senthil Kumar
            senthh Senthil Kumar
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: