Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36327

Spark sql creates staging dir inside database directory rather than creating inside table directory

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.1.2
    • Fix Version/s: 3.3.0
    • Component/s: Spark Core, SQL
    • Labels:
      None
    • Flags:
      Important

      Description

      Spark sql creates staging dir inside database directory rather than creating inside table directory.

       

      This arises only when viewfs:// is configured. When the location is hdfs://, it doesn't occur.

       

      Based on further investigation in file SaveAsHiveFile.scala, I could see that the directory hierarchy has been not properly handled for viewFS condition.
      Parent path(db path) is passed rather than passing the actual directory(table location).

      {{
      // Mostly copied from Context.java#getExternalTmpPath of Hive 1.2
      private def newVersionExternalTempPath(
      path: Path,
      hadoopConf: Configuration,
      stagingDir: String): Path = {
      val extURI: URI = path.toUri
      if (extURI.getScheme == "viewfs")

      { getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir) }

      else

      { new Path(getExternalScratchDir(extURI, hadoopConf, stagingDir), "-ext-10000") }

      }
      }}

      Please refer below lines

      ===============================
      if (extURI.getScheme == "viewfs") {
      getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir)
      ===============================

        Attachments

          Activity

            People

            • Assignee:
              senthh Senthil Kumar
              Reporter:
              senthh Senthil Kumar
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: