Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-2092

Fix compaction bug to prevent the compaction flow from going through the restructure compaction flow

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.3.0
    • None
    • None

    Description

      Problem and analysis:

      ----------------------------------------

      During data load current schema timestamp is written to the carbondata fileHeader. This is used during compaction to decide whether the block is a restructured block or the block is according to the latest schema.

      As the blocklet information is now stored in the index file, while laoding it in memory the carbondata file header is not read and due to this the schema timestamp is not getting set to the blocklet information. Due to this during compaction flow there is a mismatch on comparing the current schema time stamp with the timestamp stored in the block and the flow goes through the restructure compaction flow instead of normal compaction flow.

      Impact:

      -------------

      Compaction performance degradation as restructure compaction flow involves sorting of data again.

      Attachments

        Issue Links

          Activity

            People

              manishgupta_88 Manish Gupta
              manishgupta_88 Manish Gupta
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 50m
                  2h 50m