Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-7297

Content sharing across objects.

    XMLWordPrintableJSON

Details

    Description

      This request/suggestion was brought up by omalley during [https://www.apachecon.com/acna2022/|Apache Con 2022].  link title

      When mutating/creating a large table, there could be a huge performance boost achieved if applications can bring in data from either other existing objects or older versions of the same object. Thus, effectively the same copy of the data can be transparently addressed from multiple objects or when an object is updated.

      This capability can take many forms from an implementation standpoint, but we must design the API surface for applications first.

      To make progress, we need to do

      1. Identify the API surface that needs to be exposed for applications such as iceberg or ORC writers to leverage this feature. Should be done via exposing underlying blocks or abstracting the blocks away and only addressing this as ranges in a file to be sourced from other files (and their corresponding ranges, similar to a scatter-gather list).
        1. Should this be an extension of vectorO APIs?
        2.  Is there a need to expose the layout of sharable content  
      2. Backend modeling of the API and how Ozone will make it work. This needs to be reasoned across EC and Replication.
      3. How would this be made available as an extension to S3 APIs in addition to OFS.

      The https://issues.apache.org/jira/browse/HDDS-7288 is a duplicate of this one. Filling this in to capture the full context of the discussion. 

      Attachments

        Activity

          People

            ritesh Ritesh Shukla
            ritesh Ritesh Shukla
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: