Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-3519

Optimizations in write step to avoid unnecessary memory blk allocation/free

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.0
    • Component/s: core
    • Labels:
      None

      Description

       Issue-1:

      Context:

      For a string column with local dictionary enabled, a column page of

      `UnsafeFixLengthColumnPage` with datatype `DataTypes.BYTE_ARRAY` is created for `encodedPage` along with regular `actualPage` of `UnsafeVarLengthColumnPage`. 

      We have `capacity` field in the `UnsafeFixLengthColumnPage`. And this field indicates the capacity of  allocated

      `memoryBlock` for the page. `ensureMemory()` method gets called while adding rows to check if  `totalLength + requestSize > capacity` to allocate a new memoryBlock. If there is no room to add the next row, allocates a new block, copy the old context(prev rows) and free the old memoryBlock.

       Problem:

      While, `UnsafeFixLengthColumnPage` with with datatype `DataTypes.BYTE_ARRAY` is created for `encodedPage`, we have not assigned the `capacity` field with allocated memory block size. Hence, for each add row to tablePage, ensureMemory() check always fails, allocates a new column page memoryBlock, copy the old context(prev rows) and free the old memoryBlock. This allocation of new memoryBlock and free of old memoryBlock happens for each row row addition for the string columns with local dictionary.

       

      Issue-2:

      Context:

      In`VarLengthColumnPageBase`, we have a `rowOffset` column page of  `UnsafeFixLengthColumnPage` of datatype `INT`

      to maintain the data offset to each row of variable length columns. This `rowOffset` page allocates to be size of page. 

       Problem:

      If we have 10 rows in the page, we need 11 rows for its rowOffset page. Because we always keep 0 as offset to 1st row. So an additional row is required for rowOffset page[pasted code below to show the reference]. Otherwise, ensureMemory() check always fails for the last row(10th row in this case) of data and allocates a new rowOffset page memoryBlock, copy the old context(prev rows) and free the old memoryBlock. This can happen for the string columns with local dictionary, direct dictionary columns, global disctionary columns.

       

      public abstract class VarLengthColumnPageBase extends ColumnPage {
      ...
      @Override
      public void putBytes(int rowId, byte[] bytes) {
       ...
       if (rowId == 0) {
       rowOffset.putInt(0, 0); ==> offset to 1st row is 0.
       }
       rowOffset.putInt(rowId + 1, rowOffset.getInt(rowId) + bytes.length);
       putBytesAtRow(rowId, bytes);
       totalLength += bytes.length;
      }
      ...
      }
       
      

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                VenuReddy Venugopal Reddy K
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m