Description
It turns out that in 2.0 we changed the way full snapshots are sent from Sentry to HDFS. Before they were using HMSPaths which used tree structure and eliminated some duplication. Also SENTRY-1827 helped to compressed this on the serialization side.
Now we are using TPathChanges structure that is not tree-based and contains very non-efficient way of representing paths: required list<list<string>> addPaths; so we split each paths on slashes and store list of elements instead of storing a tree. As a result we may use much more memory.
Attachments
Attachments
Issue Links
- is related to
-
SENTRY-1916 Sentry should not store paths outside of the prefix
- Resolved
-
SENTRY-1951 Old SentryStore.retrieveFullPathsImage() should be removed
- Resolved
- relates to
-
SENTRY-872 Uber jira for HMS HA + Sentry HA redesign
- Resolved
-
SENTRY-1827 Minimize TPathsDump thrift message used in HDFS sync
- Resolved
-
SENTRY-1907 Potential memory optimization when handling big full snapshots.
- Resolved
-
SENTRY-1909 Improvements for memory usage when full path snapshot is sent from Sentry to NN
- Resolved
- links to