Uploaded image for project: 'ManifoldCF'
  1. ManifoldCF
  2. CONNECTORS-256

Connector for crawling Wikis

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • ManifoldCF 0.4
    • ManifoldCF 0.4
    • Wiki connector
    • None

    Description

      People have been trying to crawl wikis with ManifoldCF, but using the generic crawler is not a good way to do this. Instead, it looks like we really could use a wiki connector, which would understand the wiki API and thus crawl wiki content quickly and effectively.

      Some pertinent API references follow:

      I don't know if it is possible to link to a wiki document with just the pageid, but it is possible to to get the url for the referring pageid via api:
      http://en.wikipedia.org/w/api.php?action=query&prop=info&pageids=27697087&inprop=url

      It is possible to get the metadata of a document using the pages id (instead of title) directly:
      Titel -> http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=API&rvprop=timestamp|user|comment|content
      PageID -> http://en.wikipedia.org/w/api.php?action=query&prop=revisions&pageids=27697087&rvprop=timestamp|user|comment|content

      Attachments

        Activity

          People

            kwright@metacarta.com Karl Wright
            kwright@metacarta.com Karl Wright
            Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: