Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Invalid
    • Affects Version/s: 1.3
    • Fix Version/s: 1.5
    • Component/s: fetcher
    • Labels:
    • Environment:

      Any

    • Patch Info:
      Patch Available

      Description

      Basic URL normalizer lacks 2 important features

      Encode space in URL into %20 to unbreak httpclient and possibly others who do not expect space inside URL

      Ability to decode %33 encoding in URL. This is important for avoiding duplicates

        Activity

        Radim Kolar created issue -
        Radim Kolar made changes -
        Field Original Value New Value
        Attachment urlnormalizer.patch [ 12491711 ]
        Radim Kolar made changes -
        Attachment nutch.diff [ 12496406 ]
        Radim Kolar made changes -
        Attachment urlnormalizer.patch [ 12491711 ]
        Markus Jelsma made changes -
        Assignee Markus Jelsma [ markus17 ]
        Radim Kolar made changes -
        Attachment patch-urlnormalizer.diff [ 12499695 ]
        Chris A. Mattmann made changes -
        Fix Version/s 1.5 [ 12318246 ]
        Fix Version/s 1.4 [ 12316519 ]
        Radim Kolar made changes -
        Attachment nutch.diff [ 12496406 ]
        Radim Kolar made changes -
        Attachment patch-urlnormalizer.diff [ 12500437 ]
        Radim Kolar made changes -
        Attachment patch-urlnormalizer.diff [ 12499695 ]
        Radim Kolar made changes -
        Attachment patch-urlnormalizer.diff [ 12500437 ]
        Radim Kolar made changes -
        Attachment patch-with-utf8-encoding.diff [ 12502075 ]
        Radim Kolar made changes -
        Attachment patch-with-utf8-encoding.diff [ 12502075 ]
        Radim Kolar made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Invalid [ 6 ]
        Markus Jelsma made changes -
        Attachment patch-with-utf8-encoding.diff [ 12502393 ]
        Markus Jelsma made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Assignee Markus Jelsma [ markus17 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Radim Kolar
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 4h
              4h
              Remaining:
              Remaining Estimate - 4h
              4h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development