Description
contentEquals() internally wraps any given InputStream/Reader in a Buffered version (if it is not already buffered) which avoids a lot of IO penalties, but then it proceeds to read each byte/character one at a time. This leads to significantly more method calls and also a lot of byte -> int casting since the read() method returns an int between 0 and 255 instead of returning a byte.
I have a change that modifies the contentEquals() methods to internally buffer content into a byte/char array and to then do batch comparisons of those arrays using Arrays.equals instead of using a BufferedInputStream or BufferedReader and making use of the single byte/char read() methods. This reduces the number of method invocations by a factor equal to the buffer size and avoids casting every byte read to an int.
The following table shows the performance increase over 1000 iterations of comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there was a forced System.gc() between each iteration to avoid GC as a source of latency:
Average: 7236 to 858ms (8.43x speedup)
P50: 7224 to 856ms (8.44x speedup)
P90: 7249 to 860ms (8.43x speedup)
P99: 7410 to 913ms (8.12x speedup)
P100: 8330 to 1278ms (6.52x speedup)
The following table shows the performance increase over 1000 iterations of comparing 2 1GB Reader of character data (stored in memory to avoid I/O). This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there was a forced System.gc() between each iteration to avoid GC as a source of latency:
Average: 11281 to 1737ms (6.50x speedup)
P50: 11262 to 1735ms (6.49x speedup)
P90: 11292 to 1741ms (6.49x speedup)
P99: 11707 to 1774ms (6.60x speedup)
P100: 12176 to 1884ms (6.46x speedup)
Attachments
Issue Links
- links to