Details
Description
Reading in dataframes from Parquet format in s3, and executing a join between them fails when evoked by column name. Works correctly if a join condition is used instead:
sqlContext = SQLContext(sc) a = sqlContext.read.parquet('s3://path-to-data-a/') b = sqlContext.read.parquet('s3://path-to-data-b/') # result 0 rows c = a.join(b, on='id', how='left_outer') c.count() # correct output d = a.join(b, a['id']==b['id'], how='left_outer') d.count()
Attachments
Issue Links
- relates to
-
SPARK-13427 Support USING clause in JOIN
- Resolved