[SPARK-15376] DataFrame write.jdbc() inserts more rows than acutal - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 1.4.1, 1.5.0, 1.6.1
Fix Version/s: None
Component/s: None
Labels:
- DataFrame
- bulk-closed
Environment:

CentOS 6 cluster mode
Cores: 300 (300 granted, 0 left)
Executor Memory: 45.0 GB
Submit Date: Wed May 18 10:26:40 CST 2016

Description

It's a odd bug, occur under this situation:

Bar.scala

    val rddRaw = sc.textFile("xxx").map(xxx).sample(false, 0.15)
    println(rddRaw.count())    // the actual rows insert to mysql is more than rdd's record num. In my case, is 239994 (rdd),  ~241300 (database inserted)

    // iter all rows in another way, if drop the Range for loop, the bug wouldn't occur
    for(some_id <- Range(some_ids_all_range)){
      rddRaw.filter(_._2 == some_id).randomSplit(Array(x, x, x), 1)
      .foreach( rd => {
      // val curCnt = rd.count()  // if invoke count() on rd before write, it would be ok
          rd.map(x => new TestRow(null, xxx)).toDF().write.mode(SaveMode.Append).jdbc(xxx)
        }
      )
    }

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: xiaoyu chen

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 18/May/16 03:59

Updated:: 21/May/19 04:36

Resolved:: 21/May/19 04:36