Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3138

sqlContext.parquetFile should be able to take a single file as parameter

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.0
    • Component/s: SQL
    • Labels:
      None

      Description

      http://apache-spark-user-list.1001560.n3.nabble.com/sqlContext-parquetFile-path-fails-if-path-is-a-file-but-succeeds-if-a-directory-tp12345.html

      to reproduce this issue in spark-shell

      val sqlContext = new org.apache.spark.sql.SQLContext(sc)
      import sqlContext._
      import org.apache.hadoop.fs.{FileSystem, Path}
      
      case class TestRDDEntry(key: Int, value: String)
      
      val path = "/tmp/parquet_test"
      sc.parallelize((1 to 100)).map(i => TestRDDEntry(i, s"val_$i")).coalesce(1).saveAsParquetFile(path)
      
      val fsPath = new Path(path)
      val fs: FileSystem = fsPath.getFileSystem(sc.hadoopConfiguration)
      val children = fs.listStatus(fsPath).filter(_.getPath.getName.endsWith(".parquet"))
      
      val readFile = sqlContext.parquetFile(path + "/" + children(0).getPath.getName)
      

      it throws exception:

      java.lang.IllegalArgumentException: Expected file:/tmp/parquet_test/part-r-1.parquet for be a directory with Parquet files/metadata
              at org.apache.spark.sql.parquet.ParquetTypesConverter$.readMetaData(ParquetTypes.scala:374)
              at org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:414)
              at org.apache.spark.sql.parquet.ParquetRelation.<init>(ParquetRelation.scala:66)
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              chutium Teng Qiu
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: