Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
1.4.1, 1.5.0, 1.6.1
-
None
-
None
-
CentOS 6 cluster mode
Cores: 300 (300 granted, 0 left)
Executor Memory: 45.0 GB
Submit Date: Wed May 18 10:26:40 CST 2016
Description
It's a odd bug, occur under this situation:
Bar.scala
val rddRaw = sc.textFile("xxx").map(xxx).sample(false, 0.15) println(rddRaw.count()) // the actual rows insert to mysql is more than rdd's record num. In my case, is 239994 (rdd), ~241300 (database inserted) // iter all rows in another way, if drop the Range for loop, the bug wouldn't occur for(some_id <- Range(some_ids_all_range)){ rddRaw.filter(_._2 == some_id).randomSplit(Array(x, x, x), 1) .foreach( rd => { // val curCnt = rd.count() // if invoke count() on rd before write, it would be ok rd.map(x => new TestRow(null, xxx)).toDF().write.mode(SaveMode.Append).jdbc(xxx) } ) }