Uploaded image for project: 'ManifoldCF'
  1. ManifoldCF
  2. CONNECTORS-1009

Cmis Repository Connector does not handle Document updating properly

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: ManifoldCF 1.7
    • Fix Version/s: ManifoldCF 1.7
    • Component/s: CMIS connector
    • Labels:
      None

      Description

      As a part of the Fix for CONNECTORS-1004, It seems CmisRepositoryConnector does not handle document updating properly.

      Case Scenario:

      • Create a continuous crawling job using CmisRepositoryConnector.
      • Update a document on repository end.
      • The document keep submitting to OutputConnector at each crawling interval though it was not updated afterwards.

      One possible Fix needed I is : @ CmisRepositoryConnector:processDocument,

      activities.ingestDocumentWithException(nodeId, version, documentURI, rd);
      The documentURI should point to the old document URI (Now it points to the latest documentURI discovered and it may seems to confuse document references ?)

      Also, In ECM systems, for example in Alfresco, the documentIDs are formulated with the version number as well.
      Ex: workspace://SpacesStore/8e12a887-3fa8-48d6-8516-5bcfad358ba2;1.0 --> version 1.0
      workspace://SpacesStore/8e12a887-3fa8-48d6-8516-5bcfad358ba2;1.1 --> version 1.1

      When we setup a query to crawl a repository folder, we discover content by referring the child nodes. Because of that, now it seems to queue all the document versions and submit them to OutputConnector thus producing duplicate documents at the output (search) side.
      Is there a way to avoid this problem ? It will be great if the repository can just take the latest document version and submit it as an update.

        Attachments

        1. std_prints.diff
          9 kB
          Prasad Perera
        2. std_logs.txt
          27 kB
          Prasad Perera

          Activity

            People

            • Assignee:
              kwright@metacarta.com Karl Wright
              Reporter:
              PrasadPerera Prasad Perera
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: