Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-4503

[C#] ArrowStreamReader allocates and copies data excessively

    XMLWordPrintableJSON

Details

    Description

      When reading `RecordBatch` instances using the `ArrowStreamReader` class, it is currently allocating and copying memory 3 times for the data.

      1. It is allocating memory in order to read the data from the Stream, and then reading from the Stream.  (This should be the only allocation that is necessary.)
      2. It then creates a new `ArrowBuffer.Builder`, which allocates another `byte[]`, and calls `Append` on it, which copies the values to the new `byte[]`.
      3. Finally, it then calls `.Build()` on the `ArrowBuffer.Builder`, which allocates memory from the MemoryPool, and then copies the intermediate buffer into it.

       

      We should reduce this overhead to only allocating a single time (from the MemoryPool), and not copying the data more times than necessary.

      Attachments

        Issue Links

          Activity

            People

              eerhardt Eric Erhardt
              eerhardt Eric Erhardt
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 48h
                  48h
                  Remaining:
                  Time Spent - 12h 10m Remaining Estimate - 35h 50m
                  35h 50m
                  Logged:
                  Time Spent - 12h 10m Remaining Estimate - 35h 50m
                  12h 10m