[SPARK-21330] Bad partitioning does not allow to read a JDBC table with extreme values on the partition column - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.1.1
Fix Version/s: 2.1.2, 2.2.1, 2.3.0
Component/s: SQL
Labels:
None

Description

When using "extreme" values in the partition column (like having a randomly generated long number) overflow might happen, leading to the following warning message:

WARN JDBCRelation: The number of partitions is reduced because the specified number of partitions is less than the difference between upper bound and lower bound. Updated number of partitions: -1559072469251914524; Input number of partitions: 20; Lower bound: -7701345953623242445; Upper bound: 9186325650834394647.

When this happens, no data is read from the table.

This happens because of the following check in org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala:

if ((upperBound - lowerBound) >= partitioning.numPartitions)

Funny thing is that we worry about overflows a few lines later:

    // Overflow and silliness can happen if you subtract then divide.
    // Here we get a little roundoff, but that's (hopefully) OK.

A better check would be:

if ((upperBound - partitioning.numPartitions) >= lowerBound)

Attachments

Issue Links

links to

[Github] Pull Request #18800 (aray)

Activity

People

Assignee:: Andrew Ray

Reporter:: Stefano Parmesan

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 06/Jul/17 13:09

Updated:: 12/Dec/22 18:10

Resolved:: 04/Aug/17 07:58