Uploaded image for project: 'ManifoldCF'
  1. ManifoldCF
  2. CONNECTORS-735

Include crawling date as metadata in OutputConnector

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: ManifoldCF 1.2
    • Fix Version/s: ManifoldCF 1.3
    • Component/s: Framework core
    • Labels:
      None

      Description

      While datum is a nightmare (not all connectors get their dates in the same manner, same way, etc etc etc) it might be interesting to leverage the crawling to date some volatile media (such as web).

      In case of webcrawling there are 3 dates that can certainly be inferred from the crawler's activity:

      • Date of page first appeared in queue (somewhat loosely equivalent to a created date)
      • Date of last checked by the crawler (might not reflect a version update, content could still be exactly the same)
      • Date of last update (since the URL exists in the queue, it might have changed over time and the crawler m ight know about this).

        Attachments

          Activity

            People

            • Assignee:
              kwright@metacarta.com Karl Wright
              Reporter:
              gamars Stephane Gamard
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: