Uploaded image for project: 'ManifoldCF'
  1. ManifoldCF
  2. CONNECTORS-1562

Documents unreachable due to hopcount are not considered unreachable on cleanup pass

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • ManifoldCF 2.11
    • ManifoldCF 2.12
    • Manifoldcf 2.11
      Elasticsearch 6.3.2

      Web inputconnector
      elastic outputconnecotr
      Job crawls website input and outputs content to elastic

    Description

      My documents aren't removed from ElasticSearch index after rerunning the changed seeds

      I update my job to change the seedmap and rerun it or use the schedualer to keep it runneng even after updating it.
      After the rerun the unreachable documents don't get deleted.
      It only adds doucments when they can be reached.

      Attachments

        1. image-2019-01-09-14-20-50-616.png
          30 kB
          Donald Van den Driessche
        2. Screenshot from 2018-12-31 11-17-29.png
          151 kB
          Tim Steenbeke
        3. manifoldcf.log.reduced
          3.93 MB
          Karl Wright
        4. manifoldcf.log.init
          1.11 MB
          Karl Wright
        5. manifoldcf.log.cleanup
          6 kB
          Karl Wright

        Activity

          People

            kwright@metacarta.com Karl Wright
            SteenTi Tim Steenbeke
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 4h
                4h
                Remaining:
                Remaining Estimate - 4h
                4h
                Logged:
                Time Spent - Not Specified
                Not Specified