[SPARK-13473] Predicate can't be pushed through project with nondeterministic field - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.5.2, 1.6.0, 2.0.0
Fix Version/s: 1.5.3, 1.6.1, 2.0.0
Component/s: SQL
Labels:
- pull-request-available

Target Version/s:

1.5.3, 1.6.1, 2.0.0

Description

The following Spark shell snippet reproduces this issue:

import org.apache.spark.sql.functions._

val parallelism = 8 // Adjust this to default parallelism
val df = sqlContext.
  range(2 * parallelism). // 8 partitions, 2 elements per partition
  select(
    col("id"),
    monotonicallyIncreasingId().as("long_id")
  )

df.show()

// +---+-----------+
// | id|    long_id|
// +---+-----------+
// |  0|          0|
// |  1|          1|
// |  2| 8589934592|
// |  3| 8589934593|
// |  4|17179869184|
// |  5|17179869185|
// |  6|25769803776|
// |  7|25769803777|
// |  8|34359738368|
// |  9|34359738369|
// | 10|42949672960|
// | 11|42949672961|
// | 12|51539607552|
// | 13|51539607553|
// | 14|60129542144|
// | 15|60129542145|
// +---+-----------+

df.
  filter(col("id") === 3). // 2nd element in the 2nd partition
  show()

// +---+----------+
// | id|   long_id|
// +---+----------+
// |  3|8589934592|
// +---+----------+

monotonicallyIncreasingId is nondeterministic.

Attachments

Issue Links

is related to

SPARK-14241 Output of monotonically_increasing_id lacks stable relation with rows of DataFrame

Resolved

links to

[Github] Pull Request #11348 (liancheng)

[Github] Pull Request #11864 (liancheng)

GitHub Pull Request #11348

Activity

People

Assignee:: Cheng Lian

Reporter:: Cheng Lian

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 24/Feb/16 18:20

Updated:: 09/Jul/24 03:50

Resolved:: 25/Feb/16 12:50