[ACCUMULO-2353] Test improvments to java.io.InputStream.seek() for possible Hadoop patch - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Resolved
Priority: Minor
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None
Environment:

Java 6 update 45 or later
Hadoop 2.2.0

Description

At some point (early Java 7 I think, then backported to around Java 6 Update 45), the java.io.InputStream.seek() method was changed from reading byte[512] to byte[2048]. The difference can be seen in DeflaterInputStream, which has not been updated:

    public long skip(long n) throws IOException {
        if (n < 0) {
            throw new IllegalArgumentException("negative skip length");
        }
        ensureOpen();

        // Skip bytes by repeatedly decompressing small blocks
        if (rbuf.length < 512)
            rbuf = new byte[512];

        int total = (int)Math.min(n, Integer.MAX_VALUE);
        long cnt = 0;
        while (total > 0) {
            // Read a small block of uncompressed bytes
            int len = read(rbuf, 0, (total <= rbuf.length ? total : rbuf.length));

            if (len < 0) {
                break;
            }
            cnt += len;
            total -= len;
        }
        return cnt;
    }

and java.io.InputStream in Java 6 Update 45:

    // MAX_SKIP_BUFFER_SIZE is used to determine the maximum buffer skip to
    // use when skipping.
    private static final int MAX_SKIP_BUFFER_SIZE = 2048;

    public long skip(long n) throws IOException {

	long remaining = n;
	int nr;

	if (n <= 0) {
	    return 0;
	}
	
	int size = (int)Math.min(MAX_SKIP_BUFFER_SIZE, remaining);
	byte[] skipBuffer = new byte[size];

	while (remaining > 0) {
	    nr = read(skipBuffer, 0, (int)Math.min(size, remaining));
	    
	    if (nr < 0) {
		break;
	    }
	    remaining -= nr;
	}
	
	return n - remaining;
    }

In sample tests I saw about a 20% improvement in skip() when seeking towards the end of a locally cached compressed file. Looking at the DecompressorStream in HDFS, the seek method is a near copy of the old InputStream method:

  private byte[] skipBytes = new byte[512];
  @Override
  public long skip(long n) throws IOException {
    // Sanity checks
    if (n < 0) {
      throw new IllegalArgumentException("negative skip length");
    }
    checkStream();
    
    // Read 'n' bytes
    int skipped = 0;
    while (skipped < n) {
      int len = Math.min(((int)n - skipped), skipBytes.length);
      len = read(skipBytes, 0, len);
      if (len == -1) {
        eof = true;
        break;
      }
      skipped += len;
    }
    return skipped;
  }

This task is to evaluate the changes to DecompressorStream with a possible patch to HDFS and possible bug request to Oracle to port the InputStream.seek changes to DeflaterInputStream.seek

Attachments

Issue Links

relates to

HDFS-9939 Increase DecompressorStream skip buffer size

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Dave Marion

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 11/Feb/14 23:54

Updated:: 20/Oct/16 16:21

Resolved:: 20/Oct/16 16:21