Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24226

while reading data from oracle 12c from spark and using the numofpartition more than 1 is not returning the exact count

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.2.0
    • None
    • Spark Core
    • Important

    Description

      Reading data from oracle using JDBC using spark sql context as below.

      val query = s"""(select col1,col2,rownum from schematic.tablename) A)"""

      val df = sparkcontextInstance.sqlcontext.read.("jdbc")
      .option("url", urlstring)
      .option("dbtable", query)
      .option("user", username)
      .option("password", password)
      .option("numPartitions", 20)
      .option("partitionColumn", "rownum")
      .option("lowerBound", 1)
      .option("upperBound", 3000000).option("fetchsize", 1500)
      .load()

      df.count() is returning only 150000 i.e upper bound/numpartition

      The table has 3 million records
      The table does not have any numerical column so taken rownum as partition column
      The above code is returning the data frame count

      Attachments

        Activity

          People

            Unassigned Unassigned
            Chandu123 Chandan
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: