[SPARK-24226] while reading data from oracle 12c from spark and using the numofpartition more than 1 is not returning the exact count - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.2.0
Fix Version/s: None
Component/s: Spark Core
Labels:
- bulk-closed

Flags:

Important

Description

Reading data from oracle using JDBC using spark sql context as below.

val query = s"""(select col1,col2,rownum from schematic.tablename) A)"""

val df = sparkcontextInstance.sqlcontext.read.("jdbc")
.option("url", urlstring)
.option("dbtable", query)
.option("user", username)
.option("password", password)
.option("numPartitions", 20)
.option("partitionColumn", "rownum")
.option("lowerBound", 1)
.option("upperBound", 3000000).option("fetchsize", 1500)
.load()

df.count() is returning only 150000 i.e upper bound/numpartition

The table has 3 million records
The table does not have any numerical column so taken rownum as partition column
The above code is returning the data frame count

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Chandan

Votes:: 1 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 09/May/18 19:52

Updated:: 21/May/19 04:15

Resolved:: 21/May/19 04:15