[HBASE-17125] Inconsistent result when use filter to read data - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.0.0
Component/s: None
Labels:
None

Hadoop Flags:

Incompatible change
Release Note:
Marked Scan and Get's setMaxVersions() and setMaxVersions(int) as deprecated. They are easy to misunderstand with column family's max versions, so use readAllVersions() and readVersions(int) instead.

Description

Assume a cloumn's max versions is 3, then we write 4 versions of this column. The oldest version doesn't remove immediately. But from the user view, the oldest version has gone. When user use a filter to query, if the filter skip a new version, then the oldest version will be seen again. But after compact the region, then the oldest version will never been seen. So it is weird for user. The query will get inconsistent result before and after region compaction.

The reason is matchColumn method of UserScanQueryMatcher. It first check the cell by filter, then check the number of versions needed. So if the filter skip the new version, then the oldest version will be seen again when it is not removed.

Have a discussion offline with Apache9 and fenghh, now we have two solution for this problem. The first idea is check the number of versions first, then check the cell by filter. As the comment of setFilter, the filter is called after all tests for ttl, column match, deletes and max versions have been run.

  /**
   * Apply the specified server-side filter when performing the Query.
   * Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
   * for ttl, column match, deletes and max versions have been run.
   * @param filter filter to run on the server
   * @return this for invocation chaining
   */
  public Query setFilter(Filter filter) {
    this.filter = filter;
    return this;
  }

But this idea has another problem, if a column's max version is 5 and the user query only need 3 versions. It first check the version's number, then check the cell by filter. So the cells number of the result may less than 3. But there are 2 versions which don't read anymore.

So the second idea has three steps.
1. check by the max versions of this column
2. check the kv by filter
3. check the versions which user need.
But this will lead the ScanQueryMatcher more complicated. And this will break the javadoc of Query.setFilter.

Now we don't have a final solution for this problem. Suggestions are welcomed.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

17125-slack-13.txt
22/Jun/17 23:01
45 kB
Ted Yu
example.diff
18/Nov/16 09:01
4 kB
Guanghao Zhang
HBASE-17125.master.001.patch
09/Mar/17 06:52
12 kB
Guanghao Zhang
HBASE-17125.master.002.patch
10/Mar/17 09:03
13 kB
Guanghao Zhang
HBASE-17125.master.002.patch
10/Mar/17 03:45
13 kB
Guanghao Zhang
HBASE-17125.master.003.patch
14/Mar/17 10:34
28 kB
Guanghao Zhang
HBASE-17125.master.004.patch
15/Mar/17 03:15
29 kB
Guanghao Zhang
HBASE-17125.master.005.patch
15/Mar/17 05:19
29 kB
Guanghao Zhang
HBASE-17125.master.006.patch
17/Mar/17 10:21
30 kB
Guanghao Zhang
HBASE-17125.master.007.patch
13/Apr/17 04:16
62 kB
Guanghao Zhang
HBASE-17125.master.008.patch
14/Apr/17 03:32
62 kB
Guanghao Zhang
HBASE-17125.master.009.patch
21/Apr/17 10:55
66 kB
Guanghao Zhang
HBASE-17125.master.009.patch
21/Apr/17 09:01
65 kB
Guanghao Zhang
HBASE-17125.master.010.patch
04/May/17 02:46
65 kB
Guanghao Zhang
HBASE-17125.master.011.patch
21/Jun/17 08:54
65 kB
Guanghao Zhang
HBASE-17125.master.011.patch
19/Jun/17 08:03
65 kB
Guanghao Zhang
HBASE-17125.master.012.patch
22/Jun/17 06:54
51 kB
Guanghao Zhang
HBASE-17125.master.013.patch
22/Jun/17 08:43
40 kB
Guanghao Zhang
HBASE-17125.master.014.patch
23/Jun/17 05:40
33 kB
Guanghao Zhang
HBASE-17125.master.015.patch
23/Jun/17 08:29
33 kB
Guanghao Zhang
HBASE-17125.master.016.patch
24/Jun/17 01:15
34 kB
Guanghao Zhang
HBASE-17125.master.017.patch
24/Jun/17 15:06
34 kB
Guanghao Zhang
HBASE-17125.master.018.patch
26/Jun/17 10:06
35 kB
Guanghao Zhang
HBASE-17125.master.019.patch
01/Aug/17 07:19
32 kB
Guanghao Zhang
HBASE-17125.master.020.patch
07/Aug/17 02:47
35 kB
Guanghao Zhang
HBASE-17125.master.020.patch
03/Aug/17 09:25
35 kB
Guanghao Zhang
HBASE-17125.master.021.patch
07/Aug/17 06:43
35 kB
Guanghao Zhang
HBASE-17125.master.022.patch
10/Aug/17 13:12
35 kB
Guanghao Zhang
HBASE-17125.master.checkReturnedVersions.patch
21/Jun/17 23:16
44 kB
Guanghao Zhang
HBASE-17125.master.no-specified-filter.patch
21/Jun/17 09:03
39 kB
Guanghao Zhang

Issue Links

blocks

HBASE-18471 The DeleteFamily cell is skipped when StoreScanner seeks to next column

Resolved

causes

HBASE-23845 Remove deprecated setMaxVersions() from Scan

Resolved

HBASE-23846 Remove deprecated setMaxVersions(int) from Scan

Resolved

relates to

HBASE-23074 scan#setVersion is invalid.

Resolved

links to

Review Board (master)

Inconsistent result when use filter to read data

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates