Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1270

some of Deflate encoded pages not fetched

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 1.4
    • Fix Version/s: None
    • Component/s: protocol
    • Environment:

      software

    • Patch Info:
      Patch Available

      Description

      it is a problem with some of web pages that fetched but their content can not retrived
      after this change, this error fixed
      we change lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java
      public byte[] processDeflateEncoded(byte[] compressed, URL url) throws IOException {

      if (LOGGER.isTraceEnabled())

      { LOGGER.trace("inflating...."); }

      byte[] content = DeflateUtils.inflateBestEffort(compressed, getMaxContent());
      + if(content==null)
      + content = DeflateUtils.inflateBestEffort(compressed, 200000);

      if (content == null)
      throw new IOException("inflateBestEffort returned null");

      if (LOGGER.isTraceEnabled())

      { LOGGER.trace("fetched " + compressed.length + " bytes of compressed content (expanded to " + content.length + " bytes) from " + url); }

      return content;
      }

        Attachments

        1. NUTCH-1270.patch
          0.6 kB
          behnam nikbakht

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                behnam.nikbakht behnam nikbakht
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: