Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1943

Handle setInitialCapacity() for deeply nested lists of lists

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: Java

      Description

      The current implementation of setInitialCapacity() uses a factor of 5 for every level we go into list:

      So if the schema is LIST (LIST (LIST (LIST (LIST (LIST (LIST (BIGINT)))))) and we start with an initial capacity of 128, we end up throwing OversizedAllocationException from the BigIntVector because at every level we increased the capacity by 5 and by the time we reached inner scalar that actually stores the data, we were well over max size limit per vector (1MB).

      We saw this problem in Dremio when we failed to read deeply nested JSON data.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                siddteotia Siddharth Teotia
                Reporter:
                siddteotia Siddharth Teotia
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: