Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2549 protocol-http does not behave the same as browsers
  3. NUTCH-2561

protocol-http can be made to read arbitrarily large HTTP responses

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.14
    • Fix Version/s: 1.15
    • Component/s: None
    • Labels:
      None

      Description

      protocol-http limits the size of the HTTP response body. However

      • There is no limit over the size of the HTTP headers it reads. A bogus server could send an infinite stream of different HTTP headers and cause the fetcher to go out of memory, or send the same HTTP header repeatedly and cause the fetcher to timeout.
      • The same goes for the HTTP status line: no check is made concerning its size.

      This can be both a performance and a security problem.

      Joined is an example python implementation of a server that makes protocol-http receive huge amounts of data and use a lot of CPU (because of NUTCH-2563), without being stopped by http.getTimeout() nor http.getMaxContent().

        Attachments

        1. evilserver.py
          1 kB
          Gerard Bouchar

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              gbouchar Gerard Bouchar
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: