Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1346

Follow outlinks to ignore external

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.5
    • 1.6
    • fetcher
    • None
    • Patch Available

    Description

      The follow outlinks feature already respects the db.ignore.external.links setting. However, this means that outlinks of fetched pages that are external are not saved in parse data. There should be a new setting to prevent the outlink follower from going external but still storing external outlinks.

      Attachments

        1. NUTCH-1346-1.6-1.patch
          3 kB
          Markus Jelsma

        Issue Links

          Activity

            markus17 Markus Jelsma added a comment -

            Patch for 1.6!

            markus17 Markus Jelsma added a comment - Patch for 1.6!
            markus17 Markus Jelsma added a comment -

            Committed for 1.6 in rev. 1347897.

            markus17 Markus Jelsma added a comment - Committed for 1.6 in rev. 1347897.
            hudson Hudson added a comment -

            Integrated in nutch-trunk-maven #301 (See https://builds.apache.org/job/nutch-trunk-maven/301/)
            NUTCH-1346 Follow outlinks to ignore external (Revision 1347897)

            Result = SUCCESS
            markus :
            Files :

            • /nutch/trunk/CHANGES.txt
            • /nutch/trunk/conf/nutch-default.xml
            • /nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java
            hudson Hudson added a comment - Integrated in nutch-trunk-maven #301 (See https://builds.apache.org/job/nutch-trunk-maven/301/ ) NUTCH-1346 Follow outlinks to ignore external (Revision 1347897) Result = SUCCESS markus : Files : /nutch/trunk/CHANGES.txt /nutch/trunk/conf/nutch-default.xml /nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java
            hudson Hudson added a comment -

            Integrated in Nutch-trunk #1865 (See https://builds.apache.org/job/Nutch-trunk/1865/)
            NUTCH-1346 Follow outlinks to ignore external (Revision 1347897)

            Result = SUCCESS
            markus : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1347897
            Files :

            • /nutch/trunk/CHANGES.txt
            • /nutch/trunk/conf/nutch-default.xml
            • /nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java
            hudson Hudson added a comment - Integrated in Nutch-trunk #1865 (See https://builds.apache.org/job/Nutch-trunk/1865/ ) NUTCH-1346 Follow outlinks to ignore external (Revision 1347897) Result = SUCCESS markus : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1347897 Files : /nutch/trunk/CHANGES.txt /nutch/trunk/conf/nutch-default.xml /nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java

            People

              markus17 Markus Jelsma
              markus17 Markus Jelsma
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: