Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1270

some of Deflate encoded pages not fetched

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Cannot Reproduce
    • 1.4
    • None
    • protocol
    • software

    • Patch Available

    Description

      it is a problem with some of web pages that fetched but their content can not retrived
      after this change, this error fixed
      we change lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java
      public byte[] processDeflateEncoded(byte[] compressed, URL url) throws IOException {

      if (LOGGER.isTraceEnabled())

      { LOGGER.trace("inflating...."); }

      byte[] content = DeflateUtils.inflateBestEffort(compressed, getMaxContent());
      + if(content==null)
      + content = DeflateUtils.inflateBestEffort(compressed, 200000);

      if (content == null)
      throw new IOException("inflateBestEffort returned null");

      if (LOGGER.isTraceEnabled())

      { LOGGER.trace("fetched " + compressed.length + " bytes of compressed content (expanded to " + content.length + " bytes) from " + url); }

      return content;
      }

      Attachments

        1. NUTCH-1270.patch
          0.6 kB
          behnam nikbakht

        Issue Links

          Activity

            People

              Unassigned Unassigned
              behnam.nikbakht behnam nikbakht
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: