Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14463

read.text broken for partitioned tables

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 2.0.0
    • SQL
    • None

    Description

      Strongly typing the return values of read.text as Dataset[String] breaks when trying to load a partitioned table (or any table where the path looks partitioned)

      Seq((1, "test"))
        .toDF("a", "b")
        .write
        .format("text")
        .partitionBy("a")
        .save("/home/michael/text-part-bug")
      
      sqlContext.read.text("/home/michael/text-part-bug")
      
      org.apache.spark.sql.AnalysisException: Try to map struct<value:string,a:int> to Tuple1, but failed as the number of fields does not line up.
       - Input schema: struct<value:string,a:int>
       - Target schema: struct<value:string>;
      	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.org$apache$spark$sql$catalyst$encoders$ExpressionEncoder$$fail$1(ExpressionEncoder.scala:265)
      	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.validate(ExpressionEncoder.scala:279)
      	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:197)
      	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:168)
      	at org.apache.spark.sql.Dataset$.apply(Dataset.scala:57)
      	at org.apache.spark.sql.Dataset.as(Dataset.scala:357)
      	at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:450)
      

      Attachments

        Activity

          People

            jurriaanpruis Jurriaan Pruis
            marmbrus Michael Armbrust
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: