[SPARK-13801] DataFrame.col should return unresolved attribute - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: None
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed

Description

Recently I saw some JIRAs complain about wrong result when using DataFrame API. After checking their queries, I found it was caused by un-direct self-join and they build wrong join conditions. For example:

val df = ...
val df2 = df.filter(...)
df.join(df2, (df("key") + 1) === df2("key"))

In this case, the confusing part is: df("key") and df2("key2") reference to the same column, while df and df2 are different DataFrames.

I think the biggest problem is, we give users the resolved attribute. However, resolved attribute is not real column, as logical plan's output may change. For example, we will generate new output for the right child in self-join.

My proposal is: `DataFrame.col` should always return unresolved attribute. We can still do the resolution to make sure the given column name is resolvable, but don't return the resolved one, just get the name out and wrap it with UnresolvedAttribute.

Now if users run the example query I gave at the beginning, they will get analysis exception, and they will understand they need to alias df and df2 before join.

Attachments

Issue Links

is duplicated by

SPARK-13393 Column mismatch issue in left_outer join using Spark DataFrame

Resolved

is related to

SPARK-10838 Repeat to join one DataFrame twice，there will be AnalysisException.

Closed

relates to

SPARK-13393 Column mismatch issue in left_outer join using Spark DataFrame

Resolved

SPARK-14040 Null-safe and equality join produces incorrect result with filtered dataframe

Resolved

SPARK-17154 Wrong result can be returned or AnalysisException can be thrown after self-join or similar operations

Resolved

links to

[Github] Pull Request #11632 (cloud-fan)

(1 links to)

Activity

People

Assignee:: Unassigned

Reporter:: Wenchen Fan

Votes:: 2 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 10/Mar/16 11:01

Updated:: 21/May/19 04:12

Resolved:: 21/May/19 04:12