[SPARK-32809] RDD different partitions cause didderent results - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Invalid
Affects Version/s: 2.2.0
Fix Version/s: None
Component/s: Spark Core
Labels:
None
Environment:

spark2.2.0 ,scala 2.11.8 , hadoop-client2.6.0

Flags:

Important

Description

class Exec3 {
  private val exec: SparkConf = new SparkConf().setMaster("local[1]").setAppName("exec3")
  private val context = new SparkContext(exec)
  context.setCheckpointDir("checkPoint")
 
  /**
   * get total number by key 
   * in this project desired results are ("apple",25) ("huwei"，20)
   * but in fact i get ("apple",150) ("huawei"，20)
   *   when i change it to local[3] the result is correct
   *  i want to know   which cause it and how to slove it 
   */
  @Test
  def testError(): Unit ={
    val rdd = context.parallelize(Seq(("apple", 10), ("apple", 15), ("huawei", 20)))
    rdd.aggregateByKey(1.0)(
      seqOp = (zero, price) => price * zero,
      combOp = (curr, agg) => curr + agg).collect().foreach(println(_))
    context.stop()
  }
}

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: zhangchenglong

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 07/Sep/20 06:03

Updated:: 12/Dec/22 18:10

Resolved:: 07/Sep/20 06:47

Time Tracking

Estimated:

12h

Remaining:

12h

Logged:

Not Specified