Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.6.0
    • 3.7.0
    • Blob, JMAP
    • None

    Description

      This is a concern both to privacy and cost control (as one need to pay for storage).

      JMAP deploys no method to delete uploaded blobs (maybe I could propose something on the IETF)

      https://jmap.io/spec-core.html#uploading-binary-data suggest that the server might decide to delete the data.

      Under rare circumstances, the server may have deleted the blob before the client uses it; 
      the client should keep a reference to the local file so it can upload it again in such a situation.
      

      Root cause of the issue

      We rely on the AttachmentManager for uploads - which is inherited from JMAP draft.

      Attachment manager uses the following fallback right mechanism:

      • First see if the user accessing content is holding a message referencing that attachment
      • If not, second, check if he did upload that attachment.

      AttachmentManager holds some data referenced by user messages, thus automatic deletion without a clear separation of concepts looks scary...

      How:

      We should deprecate the following AttachmentMapper methods (and underlying storage code) - and simplify AttachmentManager code accordingly:

      public interface AttachmentMapper extends Mapper {
          // to be deprecated
      
          Publisher<AttachmentMetadata> storeAttachmentForOwner(ContentType contentType, InputStream attachmentContent, Username owner);
      
          Collection<Username> getOwners(AttachmentId attachmentId) throws MailboxException;
      }
      

      We should write an UploadedContentRepository, holding only the content, the content-type, the owner and the size of the data. Upload date can be useful too even if not requested by JMAP APIs. Backed by the BlobStore (and thus ObjectStorage), we will need also a metadata system on top of it (Cassandra).

      Data expiracy would be achieved via bucket deletion: all data uploaded in a month are held in a bucket, and at month+2 the bucket can be dropped - in order to ensure no data younger than a month is deleted. We can likely accept dandling metadata as no critical data is help there (user, size, content type). If needed a scroll could come and cleanup expired metadata, but it might be expensive to run.

      A webAdmin endpoint would trigger the cleanup and rely on an external scheduler to trigger the cleanup.

      We follow a similar design on the DeletedMessageVault (https://issues.apache.org/jira/browse/JAMES-2811)

      I bet my team could be working on this topic, but we do not have a plan on this just yet.

      Impact

      • Blob uploaded before this proposed changed will be accessible via the use of the AttachmentManager uploader right path (before its deletion), inaccessible after
      • Cleanup of blobContent uploaded before this change gets applied is a non goal of my proposal. A separate batch could be use, reading cassandra data, and deleting uploaded blobs. A task could maybe even be exposed for such needs...

      Definition of done

      Demostrate data expiracy in an integration test, paying with a mocked clock injected via guice.

      Documentation needs to be written so that admins do not forget to schedule the cleanup task.

      Attachments

        Activity

          People

            btellier Benoit Tellier
            btellier Benoit Tellier
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 7h 40m
                7h 40m