[HADOOP-15911] Over-eager allocation in ByteBufferUtil.fallbackRead - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: common
Labels:
None

Description

The heap-memory path of ByteBufferUtil.fallbackRead (see master branch code here massively overallocates memory when the underlying input stream returns data in smaller chunks. This happens on a regular basis when using the S3 input stream as input.

The behavior is an O(N^2)-ish. In a recent debug session, we were trying to read 6MB, but getting 16K at a time. The code would:

allocate 16M, use the first 16K
allocate 16M - 16K, use the first 16K of that
allocate 16M - 32K, use the first 16K of that
(etc)

The patch is simple. Here's the text version of the patch:

@@ -88,10 +88,17 @@ public final class ByteBufferUtil {
         buffer.flip();
       } else {
         buffer.clear();
-        int nRead = stream.read(buffer.array(),
-          buffer.arrayOffset(), maxLength);
-        if (nRead >= 0) {
-          buffer.limit(nRead);
+        int totalRead = 0;
+        while (totalRead < maxLength) {
+          final int nRead = stream.read(buffer.array(),
+            buffer.arrayOffset() + totalRead, maxLength - totalRead);
+          if (nRead <= 0) {
+            break;
+          }
+          totalRead += nRead;
+        }
+        if (totalRead >= 0) {
+          buffer.limit(totalRead);
           success = true;
         }
       }

so, essentially, do the same thing that the code in the direct memory path is doing

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Vanco Buca

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 08/Nov/18 00:09

Updated:: 30/Sep/19 20:36