[HDDS-7297] Content sharing across objects. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Ozone Datanode, Ozone Filesystem, Ozone Manager
Labels:
None

Description

This request/suggestion was brought up by omalley during [https://www.apachecon.com/acna2022/|Apache Con 2022]. link title

When mutating/creating a large table, there could be a huge performance boost achieved if applications can bring in data from either other existing objects or older versions of the same object. Thus, effectively the same copy of the data can be transparently addressed from multiple objects or when an object is updated.

This capability can take many forms from an implementation standpoint, but we must design the API surface for applications first.

To make progress, we need to do

Identify the API surface that needs to be exposed for applications such as iceberg or ORC writers to leverage this feature. Should be done via exposing underlying blocks or abstracting the blocks away and only addressing this as ranges in a file to be sourced from other files (and their corresponding ranges, similar to a scatter-gather list).
1. Should this be an extension of vectorO APIs?
2. Is there a need to expose the layout of sharable content
Backend modeling of the API and how Ozone will make it work. This needs to be reasoned across EC and Replication.
How would this be made available as an extension to S3 APIs in addition to OFS.

The https://issues.apache.org/jira/browse/HDDS-7288 is a duplicate of this one. Filling this in to capture the full context of the discussion.

Attachments

Activity

People

Assignee:: Ritesh Shukla

Reporter:: Ritesh Shukla

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 07/Oct/22 21:07

Updated:: 08/Oct/22 02:21