Nutch
  1. Nutch
  2. NUTCH-1346

Follow outlinks to ignore external

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.5
    • Fix Version/s: 1.6
    • Component/s: fetcher
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      The follow outlinks feature already respects the db.ignore.external.links setting. However, this means that outlinks of fetched pages that are external are not saved in parse data. There should be a new setting to prevent the outlink follower from going external but still storing external outlinks.

        Issue Links

          Activity

          Lewis John McGibbney made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Hudson added a comment -

          Integrated in Nutch-trunk #1865 (See https://builds.apache.org/job/Nutch-trunk/1865/)
          NUTCH-1346 Follow outlinks to ignore external (Revision 1347897)

          Result = SUCCESS
          markus : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1347897
          Files :

          • /nutch/trunk/CHANGES.txt
          • /nutch/trunk/conf/nutch-default.xml
          • /nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java
          Show
          Hudson added a comment - Integrated in Nutch-trunk #1865 (See https://builds.apache.org/job/Nutch-trunk/1865/ ) NUTCH-1346 Follow outlinks to ignore external (Revision 1347897) Result = SUCCESS markus : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1347897 Files : /nutch/trunk/CHANGES.txt /nutch/trunk/conf/nutch-default.xml /nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java
          Hide
          Hudson added a comment -

          Integrated in nutch-trunk-maven #301 (See https://builds.apache.org/job/nutch-trunk-maven/301/)
          NUTCH-1346 Follow outlinks to ignore external (Revision 1347897)

          Result = SUCCESS
          markus :
          Files :

          • /nutch/trunk/CHANGES.txt
          • /nutch/trunk/conf/nutch-default.xml
          • /nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java
          Show
          Hudson added a comment - Integrated in nutch-trunk-maven #301 (See https://builds.apache.org/job/nutch-trunk-maven/301/ ) NUTCH-1346 Follow outlinks to ignore external (Revision 1347897) Result = SUCCESS markus : Files : /nutch/trunk/CHANGES.txt /nutch/trunk/conf/nutch-default.xml /nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java
          Markus Jelsma made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Markus Jelsma added a comment -

          Committed for 1.6 in rev. 1347897.

          Show
          Markus Jelsma added a comment - Committed for 1.6 in rev. 1347897.
          Markus Jelsma made changes -
          Patch Info Patch Available [ 10042 ]
          Markus Jelsma made changes -
          Attachment NUTCH-1346-1.6-1.patch [ 12525168 ]
          Hide
          Markus Jelsma added a comment -

          Patch for 1.6!

          Show
          Markus Jelsma added a comment - Patch for 1.6!
          Markus Jelsma made changes -
          Field Original Value New Value
          Link This issue is part of NUTCH-1184 [ NUTCH-1184 ]
          Markus Jelsma created issue -

            People

            • Assignee:
              Markus Jelsma
              Reporter:
              Markus Jelsma
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development