[SPARK-14463] read.text broken for partitioned tables - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.0.0
Component/s: SQL
Labels:
None

Target Version/s:

2.0.0

Description

Strongly typing the return values of read.text as Dataset[String] breaks when trying to load a partitioned table (or any table where the path looks partitioned)

Seq((1, "test"))
  .toDF("a", "b")
  .write
  .format("text")
  .partitionBy("a")
  .save("/home/michael/text-part-bug")

sqlContext.read.text("/home/michael/text-part-bug")

org.apache.spark.sql.AnalysisException: Try to map struct<value:string,a:int> to Tuple1, but failed as the number of fields does not line up.
 - Input schema: struct<value:string,a:int>
 - Target schema: struct<value:string>;
	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.org$apache$spark$sql$catalyst$encoders$ExpressionEncoder$$fail$1(ExpressionEncoder.scala:265)
	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.validate(ExpressionEncoder.scala:279)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:197)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:168)
	at org.apache.spark.sql.Dataset$.apply(Dataset.scala:57)
	at org.apache.spark.sql.Dataset.as(Dataset.scala:357)
	at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:450)

Attachments

Issue Links

links to

[Github] Pull Request #13104 (jurriaan)

[Github] Pull Request #13184 (rxin)

Activity

People

Assignee:: Jurriaan Pruis

Reporter:: Michael Armbrust

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 07/Apr/16 19:58

Updated:: 18/May/16 23:22

Resolved:: 18/May/16 23:15