Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-4503

[C#] ArrowStreamReader allocates and copies data excessively

    XMLWordPrintableJSON

    Details

      Description

      When reading `RecordBatch` instances using the `ArrowStreamReader` class, it is currently allocating and copying memory 3 times for the data.

      1. It is allocating memory in order to read the data from the Stream, and then reading from the Stream.  (This should be the only allocation that is necessary.)
      2. It then creates a new `ArrowBuffer.Builder`, which allocates another `byte[]`, and calls `Append` on it, which copies the values to the new `byte[]`.
      3. Finally, it then calls `.Build()` on the `ArrowBuffer.Builder`, which allocates memory from the MemoryPool, and then copies the intermediate buffer into it.

       

      We should reduce this overhead to only allocating a single time (from the MemoryPool), and not copying the data more times than necessary.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                eerhardt Eric Erhardt
                Reporter:
                eerhardt Eric Erhardt
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 48h
                  48h
                  Remaining:
                  Time Spent - 12h 10m Remaining Estimate - 35h 50m
                  35h 50m
                  Logged:
                  Time Spent - 12h 10m Remaining Estimate - 35h 50m
                  12h 10m