Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1090

LinkDb (invertlinks) should inform the user when it ignores internal links

    XMLWordPrintableJSON

Details

    • Patch Available

    Description

      I used nutch to crawl sites on a single domain. After the crawl was complete I tried to build a LinkDb. The LinkDb was empty.
      It comes up that this happens because the invertlinks command ignores internal links to the same domain by default.

      Unfortunately the LinkDb class doesn't tell anything about that. So it was hard to find out why the LinkDb was empty.

      I suggest to add an information for the user when the invertlinks command is ignoring internal links.

      Attachments

        1. LinkDb.patch
          2 kB
          Marek Bachmann

        Activity

          People

            markus17 Markus Jelsma
            telekoma Marek Bachmann
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: