[SPARK-8599] Improve non-deterministic expression handling - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.4.0
Fix Version/s: 1.5.0
Component/s: SQL
Labels:
None

Target Version/s:

1.5.0

Description

Right now, we are using expressions for Random distribution generating expressions. But, we have to track them in lots of places in the optimizer to handle them carefully. Otherwise, these expressions will be treated as stateless expressions and have unexpected behaviors (e.g. ~~SPARK-8023~~).

Attachments

Issue Links

blocks

SPARK-7157 Add approximate stratified sampling to DataFrame

Resolved

Sub-Tasks

1.	After initializing a DataFrame with random columns and a seed, df.show should return same value	Resolved	Wenchen Fan
2.	After initializing a DataFrame with random columns and a seed, ordering by that random column should return same sorted order	Resolved	Wenchen Fan
3.	Filter using non-deterministic expressions should not be pushed down	Resolved	Wenchen Fan
4.	If order by clause has non-deterministic expressions, we should add a project to materialize results of these expressions	Resolved	Wenchen Fan
5.	Improve project collapse with nondeterministic expressions	Resolved	Wenchen Fan
6.	Support mutable state in code gen expressions	Resolved	Wenchen Fan
7.	make deterministic describing the tree rather than the expression	Resolved	Wenchen Fan
8.	Initialize nondeterministic expressions in code gen fallback mode	Resolved	Reynold Xin
9.	add initialization phase for nondeterministic expression	Resolved	Wenchen Fan

Activity

People

Assignee:: Wenchen Fan

Reporter:: Yin Huai

Votes:: 2 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 24/Jun/15 20:24

Updated:: 29/Jul/15 04:39

Resolved:: 29/Jul/15 04:39