[SPARK-34844] JDBCRelation columnPartition function includes the first stride in the lower partition - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Not A Problem
Affects Version/s: 3.0.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

Currently, columnPartition in JDBCRelation contains logic that adds the first stride into the lower partition. Because of this, the lower bound isn't used as the ceiling for the lower partition.

For example, say we have data 0-10, 10 partitions, and the lowerBound is set to 1. The lower/first partition should contain anything < 1. However, in the current implementation, it would include anything < 2.

A possible easy fix would be changing the following code on line 132:

currentValue += stride

To:

if (i != 0) currentValue += stride

Or include currentValue += stride within the if statement on line 131... although this creates a pretty bad looking side-effect.

Attachments

Issue Links

is related to

SPARK-34843 JDBCRelation columnPartition function improperly determines stride size. Upper bound is skewed due to stride alignment.

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Jason Yarbrough

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 23/Mar/21 20:52

Updated:: 25/Mar/21 00:19

Resolved:: 25/Mar/21 00:18