Uploaded image for project: 'ManifoldCF'
  1. ManifoldCF
  2. CONNECTORS-214

Add post-extraction inclusions and exclusions into the web connector

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • ManifoldCF 0.1, ManifoldCF 0.2
    • ManifoldCF 0.3
    • Web connector
    • None

    Description

      If html files are excluded for a job, links in these files will not be followed. If we add inclusion and exclusion filters based on post-extraction, it will be possible to fetch only certain types of documents, such as PDFs.

      Attachments

        Activity

          People

            kwright@metacarta.com Karl Wright
            erlendfg Erlend GarĂ¥sen
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: