Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6330

newParquetRelation gets incorrect FileSystem

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 1.3.0
    • 1.3.1, 1.4.0
    • SQL
    • None

    Description

      Here's a snippet from newParquet.scala:

      def refresh(): Unit = {
      val fs = FileSystem.get(sparkContext.hadoopConfiguration)

      // Support either reading a collection of raw Parquet part-files, or a collection of folders
      // containing Parquet files (e.g. partitioned Parquet table).
      val baseStatuses = paths.distinct.map { p =>
      val qualified = fs.makeQualified(new Path(p))

      if (!fs.exists(qualified) && maybeSchema.isDefined)

      { fs.mkdirs(qualified) prepareMetadata(qualified, maybeSchema.get, sparkContext.hadoopConfiguration) }

      fs.getFileStatus(qualified)
      }.toArray

      If we are running this locally and path points to S3, fs would be incorrect. A fix is to construct fs for each file separately.

      Attachments

        Issue Links

          Activity

            People

              vlyubin Volodymyr Lyubinets
              vlyubin Volodymyr Lyubinets
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: