Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32809

RDD different partitions cause didderent results

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 2.2.0
    • None
    • Spark Core
    • None
    • spark2.2.0 ,scala 2.11.8 , hadoop-client2.6.0

    • Important

    Description

      class Exec3 {
        private val exec: SparkConf = new SparkConf().setMaster("local[1]").setAppName("exec3")
        private val context = new SparkContext(exec)
        context.setCheckpointDir("checkPoint")
       
        /**
         * get total number by key 
         * in this project desired results are ("apple",25) ("huwei",20)
         * but in fact i get ("apple",150) ("huawei",20)
         *   when i change it to local[3] the result is correct
         *  i want to know   which cause it and how to slove it 
         */
        @Test
        def testError(): Unit ={
          val rdd = context.parallelize(Seq(("apple", 10), ("apple", 15), ("huawei", 20)))
          rdd.aggregateByKey(1.0)(
            seqOp = (zero, price) => price * zero,
            combOp = (curr, agg) => curr + agg).collect().foreach(println(_))
          context.stop()
        }
      }
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            mrzhang zhangchenglong
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 12h
                12h
                Remaining:
                Remaining Estimate - 12h
                12h
                Logged:
                Time Spent - Not Specified
                Not Specified