[PHOENIX-3156] DistinctPrefixFilter optimization produces incorrect results with some non-pk WHERE conditions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.8.0
Component/s: None
Labels:
None

Description

There's a corner case I found where a DISTINCT and GROUP BY query along a prefix of a compound row key might return incorrect results.

The filter relies on seeing the _0 column absolutely last, and not seeing all Cells that should be filtered. That break in two scenarios:

we have a table with key (key1, key2, key3) and columns (c1 and c2). Now construct a WHERE <a clause that always matches c1>, <a clause that filters by c2) GROUP BY key1, key2. Now the filter would mis-skip when it sees the Cell for c1.
we force lower key column names. In that case those would sort after the _0 column. The DistinctPrefixFilter would see the _0 column first and skip.

In both case we are effectively changing the order in which the filters are applied. The DistinctPrefixFilter is no longer for the row.

I can fix #1 (by ignoring all Cells other than then _0 one). I do not know how to fix case #2.

I think this is a blocker and we may have to undo the entire DISTINCT and GROUP BY prefix optimization.

ankit@apache.org, giacomotaylor, samarthjain.

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

3156.txt
06/Aug/16 00:32
3 kB
Lars Hofhansl
3156-v2.txt
06/Aug/16 04:02
5 kB
Lars Hofhansl

Issue Links

relates to

PHOENIX-258 Use skip scan when SELECT DISTINCT on leading row key column(s)

Closed

Activity

People

Assignee:: Lars Hofhansl

Reporter:: Lars Hofhansl

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 06/Aug/16 00:30

Updated:: 11/Aug/16 08:29

Resolved:: 06/Aug/16 04:42