Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6161

Allocate memory for outgoing vectors based on sizing calculations

    Details

      Description

      Currently, in drill, we allocate memory for outgoing value vectors either for max value of 64k entries or start from 4096 and keep doubling as we need more memory. Every time we double, we allocate a new vector and do a copy. We also zero fill the new half. This has performance penalty. As part of batch sizing project, based on incoming batch(es) sizing information, we are limiting number of rows in outgoing batch based on memory. Since we know the number of rows and the average size of each column in the outgoing batch, we should use that information to preallocate memory for the outgoing vectors. This will be done as each operator is being changed to adhere to produce configured batch sizes.

      Another improvement that can be done is packing the value vectors as dense as possible to improve the over all memory utilization. Since we allocate memory in powers of 2, once we figure out the number of rows to include in the outgoing batch, round it down to closest power of 2 and allocate memory for that many rows.

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ppenumarthy Padma Penumarthy
                Reporter:
                ppenumarthy Padma Penumarthy
                Reviewer:
                Paul Rogers
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: