Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-1224

Going out of memory if more segments are compacted at once in V3 format

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.3.0
    • None
    • None

    Description

      In V3 format we read the whole blocklet at once to memory in order save IO time. But it turns out to be costlier in case of parallel reading of more carbondata files.
      For example if we need to compact 50 segments then compactor need to open the readers on all the 50 segments to do merge sort. But the memory consumption is too high if each reader reads whole blocklet to the memory and there is high chances of going out of memory.

      Solution:
      In this type of scenarios we can introduce new readers for V3 to read the data page by page instead of reading whole blocklet at once to reduce the memory footprint.

      Attachments

        Activity

          People

            ravi.pesala Ravindra Pesala
            ravi.pesala Ravindra Pesala
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 5h 50m
                5h 50m