Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
(based on alex.parvulescu input):
During the segment->segment-tar migration, a fair amount of time is being taken by the deduplication process. Basically the repository is ingesting large amounts of content (a checkpoint is the equivalent of a full repo state), and once it deduplicates the data, it finds it already available in the destination repository.
The reason this happens is because the diff mechanism cannot be efficient across repositories.
For example: on the source repo we have r0 root state and cp0 a checkpoint very close to r0. the diff(r0, cp0) is extremely cheap measured in milliseconds. But what the sidegrade does is it copies r0 to the destination repository: r0 -> rx1, then it runs diff(rx1, cp0) which becomes very expensive as the 2 node states don't originate from the same repository, so diffing will fallback to a slow content equals comparison. next the content is almost equal, so a huge amount of cycles are wasted in deduplicating data over the 2 repositories.
I have no easy solution here other than looking into providing a diff mechanism that will compare the 2 local states diff(r0, cp0) BUT apply the delta to the destination repository (apply it on rx1). I'm not sure how easy this will turn out to be, and if it's worth the effort.
Attachments
Attachments
Issue Links
- Is contained by
-
OAK-5290 Backport the performance improvements for oak-upgrade from trunk
- Closed