Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-3293

Prune datamaps improvement for count(*)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.5.3
    • None
    • None

    Description

      Problem:

      (1) Currently for count ( *) , the prune is same as select * query.  Blocklet and ExtendedBlocklet are formed from the DataMapRow and that is of no need and it is a time consuming process.

      (2) Pruning in select * query consumes time in convertToSafeRow() - converting the DataMapRow to safe as in an unsafe row to get the position of data, we need to traverse through the whole row to reach a position.

      (3) In case of filter queries, even if the blocklet is valid or invalid, we are converting the DataMapRow to safeRow. This conversion is time consuming increasing the number of blocklets.

       

      Solution:

      (1) We have the blocklet row count in the DataMapRow itself, so it is just enough to read the count. With this count ( *) query performance can be improved.

      (2) Maintain the data length also to the DataMapRow, so that traversing the whole row can be avoided. With the length we can directly hit the data position.

      (3) Read only the MinMax from the DataMapRow, decide whether scan is required on that blocklet, if required only then it can be converted to safeRow, if needed.

      Attachments

        Activity

          People

            dhatchayani dhatchayani
            dhatchayani dhatchayani
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 14h 50m
                14h 50m