Description
NUTCH-1679 need to check if there exists some rows and they are proposing to use store.get(TableUtil.reverseUrl(url))).
This will have a considerably impact on performance since every column will be fetched.
Some datastores implements a call to just check if a row exists (like HBase) so no data is transfered by network.
If a datastore can't handle an "exists" call, can default to a get.
Attachments
Issue Links
- relates to
-
NUTCH-1679 UpdateDb using batchId, link may override crawled page.
- Closed