[IGNITE-8732] SQL: REPLICATED cache cannot be left-joined to PARTITIONED - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.5
Fix Version/s: 2.10
Component/s: sql
Labels:
- sql-engine

Description

Steps to reproduce

Run org.apache.ignite.sqltests.ReplicatedSqlTest#testLeftJoinReplicatedPartitioned
Observe that we have 2x results on 2-node cluster

Root Cause
left LEFT JOIN right ON cond operation assumes full scan of of a left expression. Currently we perform this scan on every node and then simply merge results on reducer. Two nodes, two scans of REPLICATED cache, 2x results.

Potential Solutions
We may consider several solutions. Deeper analysis is required to understand which is the right one.

Perform deduplication on reducer - this most prospective and general technique, described in more details below
Treat REPLICATED cache as PARTITIONED. Essentially, we just need to pass proper backup filter. But what if REPLICATED cache spans more nodes than PARTITIONED? We cannot rely on primary/backup in this case

Implement additional execution phase as follows:

SELECT left.cols, right.cols FROM left INNER JOIN right ON cond;                          // Get "inner join" part
UNION
UNICAST SELECT left.cols, [NULL].cols FROM left WHERE left.id NOT IN ([ids from the first phase]) // Get "outer join" part

Reducer Deduplication
The idea is to get all data locally and then perform final deduplication. This may incur high network overhead, because of lot of duplicated left parts would be transferred. However, this could be optimized greatly with the following techniques applied one after another

Semi-jions: left is joined on mapper node, but instead of sending (left, right) relation, we send (left) + (right)
In case left part is known to be idempotent (i.e. it produces the same result set on all nodes), only one node will send (left) + (right), other nodes will send (right) only
Merge left results with if needed (i.e. if idempotence-related opto was not applicable)
Join left and right parts on reducer

UPDATE

After a few attempts at the implementation, the solution of treating REPLICATED cache as PARTITIONED looks the most practical. The solution works in a limited case:

REPLICATED and PARTITIONED both have the same affinity function, number of partitions, node filter

- Note that REPLICATED has a different number of partitions by default

The JOIN is done on an affinity column of both caches

- Note that users often don’t create affinity keys for REPLICATED caches today

distributedJoins=false (distributed joins aren’t supported for now)

Attachments

Issue Links

is duplicated by

IGNITE-5016 SQL: Support LEFT JOIN from replicated table to a partitioned.

Resolved

relates to

IGNITE-13975 Validate that tables beind joined have the same partition distribution

Open

links to

GitHub Pull Request #8637

Activity

People

Assignee:: Stanislav Lukyanov

Reporter:: Vladimir Ozerov

Reviewer:: Taras Ledkov

Votes:: 2 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 07/Jun/18 09:43

Updated:: 05/Aug/21 11:38

Resolved:: 11/Jan/21 15:06

Time Tracking

Estimated:

Not Specified

Remaining:

Logged: