Nutch
  1. Nutch
  2. NUTCH-1344

BasicURLNormalizer to normalize https same as http

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: nutchgora, 1.6
    • Fix Version/s: 1.6, 2.2
    • Component/s: None
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      Most of the normalization done by BasicURLNormalizer (lowercasing host, removing default port, removal of page anchors, cleaning . and . in the path) is not done for URLs with protocol https.

      1. NUTCH-1344.patch
        0.8 kB
        Sebastian Nagel

        Activity

        Sebastian Nagel created issue -
        Sebastian Nagel made changes -
        Field Original Value New Value
        Attachment NUTCH-1344.patch [ 12523626 ]
        Sebastian Nagel made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 1.6 [ 12319941 ]
        Fix Version/s 2.2 [ 12323285 ]
        Resolution Fixed [ 1 ]
        Lewis John McGibbney made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Sebastian Nagel
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development