Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-560

protocol-httpclient reading more bytes than http.content.limit

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.9.0, 1.0.0
    • 1.0.0
    • fetcher
    • None

    Description

      I modified protocol-httpclient HttpResponse.java to download files to file system. If I set http.content.limit to 5000... it fetches around 5500 to 6000 bytes instead and downloads it to file system. There is calculation mistake in calculateTryToRead() function.

              int tryAndRead = calculateTryToRead(totalRead);
              while ((bufferFilled = in.read(buffer, 0, buffer.length)) != -1 && tryAndRead > 0) {
                totalRead += bufferFilled;
                out.write(buffer, 0, bufferFilled);
                tryAndRead = calculateTryToRead(totalRead);
              }

      while loop stops when calculateTryToRead() returns -ve or 0.

      private int calculateTryToRead(int totalRead) {
          int tryToRead = Http.BUFFER_SIZE;
          if (http.getMaxContent() <= 0) {
            return http.BUFFER_SIZE;
          } else if (http.getMaxContent() - totalRead < http.BUFFER_SIZE) {
            tryToRead = http.getMaxContent() - totalRead;
          }
          return tryToRead;
        }

      It is returning -ve when totalRead > http.getMaxContent(). So more bytes than http.content.limit is read before breaking while loop.

      Attachments

        Issue Links

          Activity

            People

              dogacan Dogacan Guney
              josephmanit Joseph M.
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: