[SENTRY-2305] Optimize time taken for persistence HMS snapshot by persisting in parallel - ASF JIRA

XML

Word

Printable

JSON

There are couple of options

Break the total snapshot into to batches and persist all of them in parallel in different transactions. As sentry uses repeatable_read isolation level we should be able to have parallel writes on the same table. This bring an issue if there is a failure in persisting any of the batches. This approach needs additional logic of cleaning the partially persisted snapshot. I’m evaluating this option.
- Result: Initial results are promising. Time to persist the snapshot came down by 60%.
Try disabling L1 Cache for persisting the snapshot.
Try persisting the snapshot entries sequentially in separate transactions. As transactions which commit huge data might take longer as they take a lot of CPU cycles to keep the rollback log up to date.

relates to

SENTRY-2423 Increase the allocation size for auto-increment of id's for Snapshot tables.

requires

SENTRY-2249 Enable batch insert of HMS paths in Full Snapshot.

links to

Code review Link