Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-3985

Optimize the segment-timestamp file clean up

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • core, spark-integration
    • None

    Description

      For data update, in the CarbonProjectForUpdateCommand process, after the delete delta file is generated, the status of each segment is checked. If the status is not successful, all the segment directories are traversed to clean up the timestamp corresponding .carbondata, .carbonindex and .deletedelta files.

      If a great many segments have been generated in the Partion directory, it will be very time-consuming.

      In fact, in the process of cleaning up timestamp files, we only need to clean up the files in the Segment directory involved in this update.

      In the process of generating delete delta, record the segment path involved in this update; after entering the checkAndUpdateStatusFiles() function, if a segment status is found to be not successful, it will be cleaned directly according to the segment path list that has been recorded during generating delete delta, without searching all the segment directories.

      Attachments

        Activity

          People

            Unassigned Unassigned
            su-article suwen
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h 10m
                2h 10m