> What we want to use this feature for is for a "replication" feature in our app, which is really an infrastructure service and
> already uses an admin (or similar highly privileged user) session on the target jackrabbit instance to copy over content from
> another one. Having this special "access-by-datastore-id" permission flag would only be used for that case anyway.
so, you are justifying something that looks problematic security wise with the argument that's just one specific use-case
IMHO we should not hack the built-in permissions for something that looks like a nice feature for us without thinking
about the consequences. claiming that only our replication user would be allowed to do this as naive as claiming that
something was just an 'admin-only' task. that's not how the repository is being used and if it was we could equally
just hardcode access to the getValueById to a single, dedicated user (which obviously is a bad idea).
> Now if we ensure the data store IDs are not guessable, then there is no option to browse the repository's binaries.
> If we avoid using simple hashes of the binary for the (exposed) ID, it will not be possible to check for the existence
> of certain documents known to an attacker. In fact, an attacker will only be able to get to the ID if he has the access
> rights to the content in the first place.
that's correct and i don't see a problem with this part.
what is problematic IMO is the fact that once you get access to the content ID you may be able to look at the binary
irrespective of the accessibility of the property (or properties) that hold(s) this value.
in other words: what we are adding here is a additional dimension to the way how access control is used and enforced
by the repository. we have permissions on nodes and properties and we are extending this to values irrespective of
which property this value was attached to.
if we add the contentId handling to the API, it will be used (i see the service coming that exposes contentIDs
with admin session which is searchable and where googles inurl will be a perfect fit to determine all kind of
contentIds all over the world
don't get me wrong: i am not opposed to have this in general but i am totally opposed to just hacking that in without
having a clear picture of what we are doing and careful reevaluation on what that actually means for our threat model
and for the further development (including oak).
> If not, we really need to think of bringing a similar performance-optimized replication feature into Jackrabbit / Oak itself.
again: no objection to this.... but 'thinking' is definitely the key word here