I am trying to upgrade HBase version (client and server) from 1.2.0 to 2.2.6 and started seeing some unexpected behavior around discovery of ambiguous row in filter during StoreFileScans.
Is it a valid case that filters and Comparators might see a fake cell passed to them if that row is set as an inclusive(by default) start row to skip preceding row during Store file scans during client side execution?
When rows were persisted or updated on a table through bulkload, looks like a scan with specific column triggers a different behavior compared to a scan without columns which doesn't trigger this behavior.
From what I have troubleshooted so far, it looks like this is triggered during lazy scan inside StoreScanner with StoreFileScanner implementation where it eventually returns fake cell as current row on store heap StoreFileScanner thus passed to filter but it's actually filtered later and not returned to client.
This was not the case with hbase 1.7.2. I have created couple of simple Tests using hbase 1.7.2 and hbase 2.2.6 that bulkloads some sample rows to table and creates a column specific Scan to reproduce behavior that I have been talking about.
I have simply copied KeyOnlyFilter, added few loggers to catch rowkeys being passed to filter and added few loggers to catch row keys returned as a result on client side.
Here is my working repo that demonstrate this diverged behavior hbase-scans
I have a mapper that creates PUT with row keys 0, 2, 3 and bulkload those rows to table. When a scan is issued with 2.2.6 hbase API, it parses that start row on Scan to filter during server side execution.
Screenshot of discovered row keys in filter during server side .
Screenshoot of discovered row keys in filter with hbase 1.7.2