-
Type:
Bug
-
Status: Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: ManifoldCF 1.7, ManifoldCF 1.8
-
Fix Version/s: ManifoldCF 1.8.1, ManifoldCF 2.0.1, ManifoldCF 1.9, ManifoldCF 2.1
-
Component/s: None
-
Labels:None
After upgrading to mcf 1.7 or later, pre-existing documents are recrawled and re-indexed even if they have not changed in any way since their last pre-upgrade crawl. The impact can be significant for large manifold deployments with millions+ static documents.
There appear to be three contributing factors:
1. The empty transformation version of a legacy document is different from the initial value of "0+0!" - in PipelineObjectWithVersions#buildAddPipeline and IncrementalIngester#checkFetchDocument
2. Incorrect comparison of output versions in PipelineObjectWithVersions#buildAddPipeline where oldOutputVersion is compared to a VersionContext object instead of the version string, which can be obtained by calling VersionContext#getVersionString - if IPipelineSpecification#getStageDescriptionString continues to return a VersionContext object, a rename of the method could be useful
3. In PipelineObjectWithVersions#buildAddPipeline, a null value for newAuthorityNameString is not treated the same as an empty string (like it is in other methods)