Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.8
-
None
-
None
Description
Currently the method ScoringFilter.updateDbScore() doesn't use the "old" value from existing CrawlDB. Instead it uses the value taken from the fetchlist from the current segment, which represents a snapshot of the "old" value taken at the moment of generating the fetchlist.
The problem with this approach is that if/when we add a possibility to interleave generate/fetch/update cycles, the initial score values in CrawlDatum instance that comes from the current segment could be already outdated, if another updatedb was run in the meantime, which changed the DB score.
For this reason we should always assume that the value from CrawlDB, if exists, represents the most recent version of CrawlDatum before the update, and use this instance as a base.