XMLWordPrintableJSON

Details

    Description

      In 4.3.1, LazyDecompressingInputStream was introduced. However, LazyDecompressingInputStream subclasses InputStream without overriding the multi-byte read() method, and the inherited method does a byte-by-byte read.

      This is trace showing what happens:

      java.util.zip.Inflater.inflateBytes(Inflater.java:Unknown line)
      java.util.zip.Inflater.inflate(Inflater.java:259)
      java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
      java.util.zip.GZIPInputStream.read(GZIPInputStream.java:116)
      java.util.zip.InflaterInputStream.read(InflaterInputStream.java:122)
      org.apache.http.client.entity.LazyDecompressingInputStream.read(LazyDecompressingInputStream.java:56)
      java.io.InputStream.read(InputStream.java:179)
      it.unimi.di.law.warc.util.InspectableCachedHttpEntity.copyContent(InspectableCachedHttpEntity.java:67)

      copyContent() would love to read(byte[],int,int) in a buffer, but since LazyDecompressingInputStream doesn't override it it invokes instead the read-byte-by-byte inherited method in InputStream, which in turn now calls for each byte the one-byte read() method from LazyDecompressingInputStream, which invokes the one-byte read method from InflaterInputStream, which does a multi-byte, length-one read from GZIPInputStream, which unleashes a similar call on InflaterInputStream, which unfortunately makes a similar read using the native inflateBytes() method.

      Thus, for each byte there is a native-method call. The result is a 10-50x increase in CPU usage, which turns into a 10x-50x decrease in speed if, as in our case, you have 7000 threads downloading in parallel.

      Overriding read(byte[],int,int) in LazyDecompressingInputStream will solve the problem:

      @Override
      public int read(byte[] b, int off, int len) throws IOException

      { initWrapper(); return wrapperStream.read(b, off, len); }

      Attachments

        Activity

          People

            Unassigned Unassigned
            vigna Sebastiano Vigna
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: