Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.9
    • Component/s: None
    • Labels:
      None

      Description

      A normalizer for dealing with query strings. Sorting query strings is helpful in preventing duplicates for some (bad) websites.

      1. NUTCH-1327-1.8-1.patch
        10 kB
        Markus Jelsma
      2. NUTCH-1327-1.8-2.patch
        12 kB
        Markus Jelsma

        Activity

        Markus Jelsma created issue -
        Lewis John McGibbney made changes -
        Field Original Value New Value
        Fix Version/s 1.7 [ 12323281 ]
        Fix Version/s 1.6 [ 12319941 ]
        Lewis John McGibbney made changes -
        Fix Version/s 1.9 [ 12324611 ]
        Fix Version/s 1.7 [ 12323281 ]
        Hide
        Markus Jelsma added a comment -

        Patch for trunk. It rebuilds the URL with querystring parameters properly sorted.

        Show
        Markus Jelsma added a comment - Patch for trunk. It rebuilds the URL with querystring parameters properly sorted.
        Markus Jelsma made changes -
        Attachment NUTCH-1327-1.8-1.patch [ 12587619 ]
        Hide
        Markus Jelsma added a comment -

        Any comments? Thanks

        Show
        Markus Jelsma added a comment - Any comments? Thanks
        Hide
        Tejas Patil added a comment -

        Hi Markus,

        1. The patch when applied as is didn't compile the plugin. I had to add entries into src/plugin/build.xml to get it compiled.
        2. Can you kindly add some javadoc comments in QuerystringURLNormalizer class so that people can quickly get an idea about what this plugin would do ?

        Show
        Tejas Patil added a comment - Hi Markus, 1. The patch when applied as is didn't compile the plugin. I had to add entries into src/plugin/build.xml to get it compiled. 2. Can you kindly add some javadoc comments in QuerystringURLNormalizer class so that people can quickly get an idea about what this plugin would do ?
        Hide
        lufeng added a comment -

        Hi Markus, I tested you patch, Do you forget to add deploy and test target into src/plugin/build.xml?

        +1

        Show
        lufeng added a comment - Hi Markus, I tested you patch, Do you forget to add deploy and test target into src/plugin/build.xml? +1
        Hide
        Markus Jelsma added a comment -

        Thanks! I always forget something! Here's a new one plus comment!

        Show
        Markus Jelsma added a comment - Thanks! I always forget something! Here's a new one plus comment!
        Markus Jelsma made changes -
        Attachment NUTCH-1327-1.8-2.patch [ 12590262 ]
        Hide
        Markus Jelsma added a comment -

        Thanks. Committed for trunk in rev. 1498832.

        Show
        Markus Jelsma added a comment - Thanks. Committed for trunk in rev. 1498832.
        Markus Jelsma made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in Nutch-trunk #2265 (See https://builds.apache.org/job/Nutch-trunk/2265/)
        NUTCH-1327 QueryStringNormalizer (Revision 1498830)

        Result = SUCCESS
        markus : http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1498830
        Files :

        • /nutch/trunk/CHANGES.txt
        • /nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDbReader.java
        Show
        Hudson added a comment - Integrated in Nutch-trunk #2265 (See https://builds.apache.org/job/Nutch-trunk/2265/ ) NUTCH-1327 QueryStringNormalizer (Revision 1498830) Result = SUCCESS markus : http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1498830 Files : /nutch/trunk/CHANGES.txt /nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDbReader.java

          People

          • Assignee:
            Markus Jelsma
            Reporter:
            Markus Jelsma
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development