Index: src/main/java/org/apache/jackrabbit/api/JackrabbitValueFactory.java =================================================================== --- src/main/java/org/apache/jackrabbit/api/JackrabbitValueFactory.java (revision 1837264) +++ src/main/java/org/apache/jackrabbit/api/JackrabbitValueFactory.java (working copy) @@ -18,6 +18,7 @@ package org.apache.jackrabbit.api; +import java.io.InputStream; import javax.jcr.AccessDeniedException; import javax.jcr.Binary; import javax.jcr.RepositoryException; @@ -24,6 +25,7 @@ import javax.jcr.Session; import javax.jcr.ValueFactory; +import org.apache.jackrabbit.api.binary.BinaryDownloadOptions; import org.apache.jackrabbit.api.binary.BinaryUpload; import org.apache.jackrabbit.api.binary.BinaryDownload; import org.jetbrains.annotations.NotNull; @@ -32,42 +34,57 @@ /** * Defines optional functionality that a {@link ValueFactory} may choose to - * provide. A {@link ValueFactory} may also implement this interface without - * supporting all of the capabilities in this interface. Each method of the + * provide. A {@link ValueFactory} may also implement this interface without + * supporting all of the capabilities in this interface. Each method of the * interface describes the behavior of that method if the underlying capability * is not available. + * *

- * Currently this interface defines the following optional features: + * This interface defines the following optional features: *

+ * *

* The features are described in more detail below. * *

Direct Binary Access

- *

+ * * The Direct Binary Access feature provides the capability for a client to - * upload or download binaries directly to/from a storage location. For - * example, this might be a cloud storage providing high-bandwidth direct - * network access. This API allows for requests to be authenticated and for - * access permission checks to take place within the repository, but for clients - * to then access the storage location directly. + * upload or download binaries directly to/from a storage location. For example, + * this might be a cloud storage providing high-bandwidth direct network access. + * This API allows for requests to be authenticated and for access permission + * checks to take place within the repository, but for clients to then access + * the storage location directly. + * *

- * The feature consists of two parts, direct binary upload and direct binary - * download. + * The feature consists of two parts, download and upload. + * + *

Direct Binary Download

+ * + * This feature enables remote clients to download binaries directly from a + * storage location without streaming the binary through the Jackrabbit-based + * application. + * *

+ * For an existing {@link Binary} value that implements {@link BinaryDownload}, + * a read-only URI (see {@link BinaryDownload#getURI(BinaryDownloadOptions)}) + * can be retrieved and passed to a remote client, such as a browser. * *

Direct Binary Upload

- *

+ * * This feature enables remote clients to upload binaries directly to a storage * location. + * *

- * When adding binaries already present on the same JVM or server as Jackrabbit - * or Oak, for example because they were generated locally, please use the - * regular JCR API for {@link javax.jcr.Property#setValue(Binary) adding - * binaries through input streams} instead. This feature is solely designed for - * remote clients. + * Note: When adding binaries already present on the same JVM/server as + * the JCR repository, for example because they were generated locally, please + * use the regular JCR API {@link ValueFactory#createBinary(InputStream)} + * instead. This feature is solely designed for remote clients. + * *

* The direct binary upload process is split into 3 phases: *

    @@ -75,82 +92,67 @@ * Initialize: A remote client makes request to the * Jackrabbit-based application to request an upload, which calls {@link * #initiateBinaryUpload(long, int)} and returns the resulting {@link - * BinaryUpload information} to the remote client. + * BinaryUpload instructions} to the remote client. * *
  1. * Upload: The remote client performs the actual binary upload - * directly to the binary storage provider. The {@link BinaryUpload} + * directly to the binary storage provider. The {@link BinaryUpload} * returned from the previous call to {@link * #initiateBinaryUpload(long, int)} contains detailed instructions on - * how to complete the upload successfully. For more information, see - * the BinaryUpload documentation. + * how to complete the upload successfully. *
  2. *
  3. * Complete: The remote client notifies the Jackrabbit-based - * application that step 2 is complete. The upload token returned in + * application that step 2 is complete. The upload token returned in * the first step (obtained by calling {@link - * BinaryUpload#getUploadToken()} is passed by the client to {@link + * BinaryUpload#getUploadToken()} is passed by the application to {@link * #completeBinaryUpload(String)}. This will provide the application * with a regular {@link Binary JCR Binary} that can then be used to * write JCR content including the binary (such as an nt:file structure) - * and {@link Session#save() persist} it. + * and persist it using {@link Session#save}. *
  4. *
- *

- *

Direct Binary Download

- *

- * The direct binary download process is described in detail in {@link - * BinaryDownload}. */ @ProviderType public interface JackrabbitValueFactory extends ValueFactory { + /** - * Initiate a transaction to upload binary data directly to a storage - * location. {@link IllegalArgumentException} will be thrown if an upload + * Initiate a transaction to upload a binary directly to a storage + * location and return {@link BinaryUpload} instructions for a remote client. + * Returns {@code null} if the feature is not available. + * + *

+ * {@link IllegalArgumentException} will be thrown if an upload * cannot be supported for the required parameters, or if the parameters are - * otherwise invalid. For example, if the value of {@code maxSize} exceeds - * the size limits for a single binary upload for the implementation or the - * service provider, or if the value of {@code maxSize} divided by {@code - * maxParts} exceeds the size limit for an upload or upload part of the - * implementation or the service provider, {@link IllegalArgumentException} - * may be thrown. - *

- * Each service provider has specific limitations on upload sizes, - * multi-part upload support, part sizes, etc. which can result in {@link - * IllegalArgumentException} being thrown. You should consult the - * documentation for your underlying implementation and your service - * provider for details. - *

- * If this call is successful, a {@link BinaryUpload} is returned - * which contains the information a client needs to successfully complete - * a direct upload. + * otherwise invalid. Each service provider has specific limitations. * - * @param maxSize The expected maximum size of the binary to be uploaded by - * the client. If the actual size of the binary is known, this - * size should be used; otherwise, the client should make a best - * guess. If a client calls this method with one size and then - * later determines that the guess was too small, the transaction - * should be restarted by calling this method again with the correct - * size. + * @param maxSize The exact size of the binary to be uploaded or the + * estimated maximum size if the exact size is unknown. + * If the estimation was too small, the transaction + * should be restarted by invoking this method again + * using the correct size. * @param maxURIs The maximum number of upload URIs that the client can - * accept. The implementation will ensure that an upload of size - * {@code maxSize} can be completed by splitting the value of {@code - * maxSize} into parts, such that the size of the largest part does - * not exceed any known implementation or service provider - * limitations on upload part size and such that the number of parts - * does not exceed the value of {@code maxURIs}. If this is not - * possible, {@link IllegalArgumentException} will be thrown. A - * client may specify -1 for this value, indicating that any number - * of URIs may be returned. - * @return A {@link BinaryUpload} that can be used by the client to complete - * the upload via a call to {@link #completeBinaryUpload(String)}, + * accept, for example due to message size limitations. + * A value of -1 indicates no limit. + * Upon a successful return, it is ensured that an upload + * of {@code maxSize} can be completed by splitting the + * binary into {@code maxURIs} parts, otherwise + * {@link IllegalArgumentException} will be thrown. + * + * @return A {@link BinaryUpload} providing the upload instructions, * or {@code null} if the implementation does not support the direct * upload feature. + * * @throws IllegalArgumentException if the provided arguments are - * invalid or if a valid upload cannot be completed given the - * provided arguments. - * @throws AccessDeniedException if it is determined that insufficient - * permission exists to perform the upload. + * invalid or if an upload cannot be completed given the + * provided arguments. For example, if the value of {@code maxSize} + * exceeds the size limits for a single binary upload for the + * implementation or the service provider, or if the value of + * {@code maxSize} divided by {@code maxParts} exceeds the size + * limit for an upload or upload part. + * + * @throws AccessDeniedException if the session has insufficient + * permission to perform the upload. */ @Nullable BinaryUpload initiateBinaryUpload(long maxSize, int maxURIs) @@ -157,30 +159,26 @@ throws IllegalArgumentException, AccessDeniedException; /** - * Complete a transaction to upload binary data directly to a storage - * location. The client must provide a valid {@code uploadToken} that can - * only be obtained via a previous call to {@link - * #initiateBinaryUpload(long, int)}. If the {@code uploadToken} is - * unreadable or invalid, {@link IllegalArgumentException} will be thrown. + * Complete the transaction of uploading a binary directly to a storage + * location and return a {@link Binary} to set as value for a binary + * JCR property. The binary is not automatically associated with + * any location in the JCR. + * *

- * Calling this method does not associate the returned {@link Binary} with - * any location in the repository. It is the responsibility of the client - * to do this if desired. - *

- * The {@code uploadToken} can be obtained from the {@link - * BinaryUpload} returned from a prior call to {@link - * #initiateBinaryUpload(long, int)}. Clients should treat the {@code - * uploadToken} as an immutable string, and should expect that - * implementations will sign the string and verify the signature when this - * method is called. + * The client must provide a valid upload token, obtained from + * {@link BinaryUpload#getUploadToken()} when this transaction was initialized + * using {@link #initiateBinaryUpload(long, int)}. + * If the {@code uploadToken} is unreadable or invalid, + * an {@link IllegalArgumentException} will be thrown. * - * @param uploadToken A String that is used to identify the direct upload - * transaction. + * @param uploadToken A String identifying the upload transaction. + * * @return The uploaded {@link Binary}, or {@code null} if the * implementation does not support the direct upload feature. - * @throws IllegalArgumentException if the {@code uploadToken} is - * unreadable or invalid. - * @throws RepositoryException if a repository access error occurs. + * + * @throws IllegalArgumentException if the {@code uploadToken} is invalid or + * does not identify a known binary upload. + * @throws RepositoryException if another error occurs. */ @Nullable Binary completeBinaryUpload(@NotNull String uploadToken) Index: src/main/java/org/apache/jackrabbit/api/binary/BinaryDownload.java =================================================================== --- src/main/java/org/apache/jackrabbit/api/binary/BinaryDownload.java (revision 1837264) +++ src/main/java/org/apache/jackrabbit/api/binary/BinaryDownload.java (working copy) @@ -33,25 +33,72 @@ @ProviderType public interface BinaryDownload extends Binary { /** - * Get a URI for downloading a {@link Binary} directly from a storage - * location with the provided {@link BinaryDownloadOptions}. This is - * probably a signed URI with a short TTL (time to live), although the API - * does not require it to be so. + * + * Returns a URI for downloading this binary directly from the storage location. + * *

- * The implementation will attempt to apply the specified {@code - * downloadOptions} to the subsequent download. For example, if the caller - * knows that the URI refers to a specific type of content, the caller can - * specify that content type by setting the internet media type and - * character encoding in the {@code downloadOptions}. The caller may also - * use a default instance obtained via {@link BinaryDownloadOptions#DEFAULT} - * in which case the caller is indicating that the default behavior of the - * service provider is acceptable. + * Using the {@code downloadOptions} parameter, some response headers of the + * download request can be overwritten, if supported by the storage provider. + * This is necessary to pass information that is only stored in the JCR in + * application specific structures, and not reliably available in the binary + * storage. * + * {@link BinaryDownloadOptions} supports, but is not limited to: + *

+ * + * Specifying {@link BinaryDownloadOptions#DEFAULT} will use mostly empty + * defaults, relying on the storage provider attributes for this binary + * (that might be empty or different from the information in the JCR). + * + *

+ * Security considerations: + * + *

+ * * @param downloadOptions * A {@link BinaryDownloadOptions} instance which is used to * request specific options on the binary to be downloaded. * {@link BinaryDownloadOptions#DEFAULT} should be used if the - * caller wishes to accept the service provider's default + * caller wishes to accept the storage provider's default * behavior. * @return A URI for downloading the binary directly, or {@code null} if the * binary cannot be downloaded directly or if the underlying Index: src/main/java/org/apache/jackrabbit/api/binary/BinaryDownloadOptions.java =================================================================== --- src/main/java/org/apache/jackrabbit/api/binary/BinaryDownloadOptions.java (revision 1837264) +++ src/main/java/org/apache/jackrabbit/api/binary/BinaryDownloadOptions.java (working copy) @@ -184,10 +184,10 @@ /** * Sets the character encoding of the {@link BinaryDownloadOptions} object to be - * built. This value should be a valid {@code jcr:encoding}. + * built. This value should be a valid {@code jcr:encoding} property value. *

* Calling this method has the effect of instructing the service - * provider to set {@code charecterEncoding} as the "charset" parameter + * provider to set {@code characterEncoding} as the "charset" parameter * of the content type in the {@code Content-Type} header field of the * response to a request issued with a URI obtained by calling {@link * BinaryDownload#getURI(BinaryDownloadOptions)}. This value can be @@ -195,9 +195,9 @@ * BinaryDownloadOptions#getCharacterEncoding()} on the instance returned by a * call to {@link #build()}. *

- * Note that setting the character encoding only makes sense if the internet media type has - * also been set. See {@link - * #withMediaType(String)}. + * Note that setting the character encoding only makes sense if the internet + * media type has also been set, and that media type actually defines a + * "charset" parameter. See {@link #withMediaType(String)}. *

* The caller should ensure that the proper character encoding has been set for * the internet media type; the implementation does not perform any validation of @@ -216,7 +216,7 @@ /** * Sets the filename of the {@link BinaryDownloadOptions} object to be - * built. + * built. This would typically be based on a JCR node name. *

* Calling this method has the effect of instructing the service * provider to set {@code fileName} as the filename in the {@code Index: src/main/java/org/apache/jackrabbit/api/binary/BinaryUpload.java =================================================================== --- src/main/java/org/apache/jackrabbit/api/binary/BinaryUpload.java (revision 1837264) +++ src/main/java/org/apache/jackrabbit/api/binary/BinaryUpload.java (working copy) @@ -25,78 +25,104 @@ import org.osgi.annotation.versioning.ProviderType; /** - * This extension interface provides a mechanism whereby a client can upload a - * binary directly to a storage location. An object of this type can be - * created by a call to {@link - * JackrabbitValueFactory#initiateBinaryUpload(long, int)} which will return an - * object of this type if the underlying implementation supports direct upload - * functionality. When calling this method, the client indicates the expected - * size of the binary and the number of URIs that it is willing to accept. The - * implementation will attempt to create an instance of this class that is - * suited to enabling the client to complete the upload successfully. + * Describes uploading a binary through HTTP requests in a single or multiple + * parts. This will be returned by + * {@link JackrabbitValueFactory#initiateBinaryUpload(long, int)}. A high-level + * overview of the process can be found in {@link JackrabbitValueFactory}. + * *

- * Using an instance of this class, a client can then use one or more of the - * included URIs for uploading the binary directly by calling {@link - * #getUploadURIs()} and iterating through the URIs returned. Multi-part - * uploads are supported by the interface, although they may not be supported - * by the underlying implementation. + * Note that although the API allows URI schemes other than "http(s)", the + * upload functionality is currently only defined for HTTP. + * *

- * Once a client finishes uploading the binary data, the client must then call + * A caller usually needs to pass the information provided by this interface to + * a remote client that is in possession of the actual binary, who then has to + * upload the binary using HTTP according to the logic described below. A remote + * client is expected to support multi-part uploads as per the logic described + * below, in case multiple URIs are returned. + * + *

+ * Once a remote client finishes uploading the binary data, the application must + * be notified and must then call * {@link JackrabbitValueFactory#completeBinaryUpload(String)} to complete the - * upload. This call requires an upload token which can be obtained from an - * instance of this class by calling {@link #getUploadToken()}. + * upload. This completion requires the exact upload token obtained from + * {@link #getUploadToken()}. + * + *

Upload algorithm

+ * + * A remote client will have to follow this algorithm to upload a binary based + * on the information provided by this interface. + * *

- * Below is the detailed direct binary upload algorithm for the remote client. - *

- * In this example the following variables are used: + * Please be aware that if the size passed to + * {@link JackrabbitValueFactory#initiateBinaryUpload(long, int)} was an + * estimation, but the actual binary is larger, there is no guarantee the + * upload will be possible using all {@link #getUploadURIs()} and the + * {@link #getMaxPartSize()}. In such cases, the application should restart the + * transaction using the correct size. + * + *

Variables used

* * - * Steps: + *

Steps

*
    - *
  1. If (fileSize divided by maxPartSize) is larger than numUploadURIs, - * then the client cannot proceed and will have to request a new set of URIs - * with the right fileSize as maxSize - *
  2. If fileSize is smaller than minPartSize, then take the first provided - * upload URI to upload the entire binary, with partSize = fileSize
  3. *
  4. + * If {@code (fileSize / maxPartSize) > numUploadURIs}, then the + * client cannot proceed and will have to request a new set of URIs + * with the right fileSize as {@code maxSize} + *
  5. + *
  6. + * If {@code fileSize < minPartSize}, then take the first provided + * upload URI to upload the entire binary, with + * {@code partSize = fileSize} + *
  7. + *
  8. * (optional) If the client has more information to optimize, the - * partSize can be chosen, under the condition that all of these are - * true for the partSize: + * {@code partSize} can be chosen, under the condition that all of these are + * true: *
      - *
    1. larger than minPartSize - *
    2. smaller or equal than maxPartSize (unless it is -1 = - * unlimited) - *
    3. larger than fileSize divided by numUploadURIs + *
    4. {@code partSize >= minPartSize}
    5. + *
    6. {@code partSize <= maxPartSize} + * (unless {@code maxPartSize = -1} meaning unlimited)
    7. + *
    8. {@code partSize > (fileSize / numUploadURIs)}
    9. *
    *
  9. - *
  10. Otherwise all part URIs are to be used and the partSize = fileSize - * divided by numUploadURIs (integer division, discard modulo which will be - * the last part) - *
  11. Upload: segment the binary into partSize, for each segment take the - * next URI from uploadURIs (strictly in order), proceed with a standard - * HTTP PUT for each (for "http(s)" URIs, otherwise currently unspecified), - * and for the last part use whatever segment size is left - *
  12. If a segment fails during upload, retry (up to a certain time out) - *
  13. After the upload has finished successfully, notify the application, - * for example through a complete request, passing the {@link - * #getUploadToken() upload token}, and the application will call {@link - * JackrabbitValueFactory#completeBinaryUpload(String)} with the token + *
  14. + * Otherwise all part URIs are to be used. The {@code partSize} + * to use for all parts except the last would be calculated using: + *
    partSize = (fileSize + numUploadURIs - 1) / numUploadURIs
    + *
  15. + *
  16. + * Upload: segment the binary into {@code partSize}, for each segment take the + * next URI from {@code uploadURIs} (strictly in order), proceed with a standard + * HTTP PUT for each, and for the last part use whatever segment size is left + *
  17. + *
  18. + * If a segment fails during upload, retry (up to a certain timeout) + *
  19. + *
  20. + * After the upload has finished successfully, notify the application, + * for example through a complete request, passing the {@link + * #getUploadToken() upload token}, and the application will call {@link + * JackrabbitValueFactory#completeBinaryUpload(String)} with the token + *
  21. *
* - *

JSON view

+ *

Example JSON view

* * A JSON representation of this interface as passed back to a remote client * might look like this: + * *
  * {
  *     "uploadToken": "aaaa-bbbb-cccc-dddd-eeee-ffff-gggg-hhhh",
@@ -110,46 +136,65 @@
  *     ]
  * }
  * 
- * */ + */ @ProviderType public interface BinaryUpload { /** - * Returns an Iterable of URIs that can be used for uploading binary data - * directly to a storage location. The first URI can be used for uploading - * binary data as a single entity, or multiple URIs can be used if the - * client wishes to do multi-part uploads. + * Returns a list of URIs that can be used for uploading binary data + * directly to a storage location in one or more parts. + * *

- * Clients are not necessarily required to use all of the URIs provided. A - * client may choose to use fewer, or even only one of the URIs. However, - * regardless of the number of URIs used, they must be consumed in sequence. + * Remote clients must support multi-part uploading as per the + * upload algorithm described above. Clients + * are not necessarily required to use all of the URIs provided. A client + * may choose to use fewer, or even only one of the URIs. However, it must + * always ensure the part size is between {@link #getMinPartSize()} and + * {@link #getMaxPartSize()}. These can reflect strict limitations of the + * storage provider. + * + *

+ * Regardless of the number of URIs used, they must be consumed in sequence, + * without skipping any, and the order of parts the original binary is split + * into must correspond exactly with the order of URIs. + * + *

* For example, if a client wishes to upload a binary in three parts and * there are five URIs returned, the client must use the first URI to * upload the first part, the second URI to upload the second part, and - * the third URI to upload the third part. The client is not required to - * use the fourth and fifth URIs. However, using the second URI to upload + * the third URI to upload the third part. The client is not required to + * use the fourth and fifth URIs. However, using the second URI to upload * the third part may result in either an upload failure or a corrupted * upload; likewise, skipping the second URI to use subsequent URIs may * result in either an upload failure or a corrupted upload. + * *

- * Clients should be aware that some storage providers have limitations on - * the minimum and maximum size of a binary payload for a single upload, so - * clients should take these limitations into account when deciding how many - * of the URIs to use. Underlying implementations may also choose to - * enforce their own limitations. - *

* While the API supports multi-part uploading via multiple upload URIs, - * implementations are not required to support multi-part uploading. If the + * implementations are not required to support multi-part uploading. If the * underlying implementation does not support multi-part uploading, a single * URI will be returned regardless of the size of the data being uploaded. + * *

- * Some storage providers also support multi-part uploads by reusing a - * single URI multiple times, in which case the implementation may also - * return a single URI regardless of the size of the data being uploaded. - *

- * You should consult both the DataStore implementation documentation and - * the storage service provider documentation for details on such matters as - * multi-part upload support, upload minimum and maximum sizes, etc. + * Security considerations: * + *

+ * * @return Iterable of URIs that can be used for uploading directly to a * storage location. */ @@ -157,54 +202,48 @@ Iterable getUploadURIs(); /** - * The smallest part size a client may upload for a multi-part upload, not - * counting the final part. This is usually either a service provider or - * implementation limitation. + * Return the smallest possible part size in bytes. If a consumer wants to + * choose a custom part size, it cannot be smaller than this value. This + * does not apply to the final part. This value will be equal or larger than + * zero. + * *

- * Note that the API offers no guarantees that uploading parts of this size - * can successfully complete the requested upload using the URIs provided - * via {@link #getUploadURIs()}. In other words, clients wishing to perform - * a multi-part upload must split the upload into parts of at least this - * size, but the sizes may need to be larger in order to successfully - * complete the upload. + * Note that the API offers no guarantees that using this minimal part size + * is possible with the number of available {@link #getUploadURIs()}. This + * might not be the case if the binary is too large. Please refer to the + * upload algorithm for the correct use of + * this value. * - * @return The smallest size acceptable for multi-part uploads. + * @return The smallest part size acceptable for multi-part uploads. */ long getMinPartSize(); /** - * The largest part size a client may upload for a multi-part upload. This - * is usually either a service provider or implementation limitation. + * Return the largest possible part size in bytes. If a consumer wants to + * choose a custom part size, it cannot be larger than this value. + * If this returns -1, the maximum is unlimited. + * *

- * The API guarantees that a client can successfully complete a direct - * upload of the binary data of the requested size using the provided URIs - * by splitting the binary data into parts of the size returned by this - * method. - *

- * The client is not required to use part sizes of this size; smaller sizes - * may be used so long as they are at least as large as the size returned by - * {@link #getMinPartSize()}. - *

- * If the binary size specified by a client when calling {@link - * JackrabbitValueFactory#initiateBinaryUpload(long, int)} ends up being - * smaller than the actual size of the binary being uploaded, these API - * guarantees no longer apply, and it may not be possible to complete the - * upload using the URIs provided. In such cases, the client should restart - * the transaction using the correct size. + * The API guarantees that a client can split the binary of the requested + * size using this maximum part size and there will be sufficient URIs + * available in {@link #getUploadURIs()}. Please refer to the + * upload algorithm for the correct use of + * this value. * - * @return The maximum size of an upload part for multi-part uploads. + * @return The maximum part size acceptable for multi-part uploads or -1 + * if there is no limit. */ long getMaxPartSize(); /** - * Returns the upload token to be used in a subsequent call to {@link - * JackrabbitValueFactory#completeBinaryUpload(String)}. This upload token - * is used by the implementation to identify this upload. Clients should - * treat the upload token as an immutable string, as the underlying - * implementation may choose to implement techniques to detect tampering and - * reject the upload if the token is modified. + * Returns a token identifying this upload. This is required to finalize the upload + * at the end by calling {@link JackrabbitValueFactory#completeBinaryUpload(String)}. * - * @return This upload's unique upload token. + *

+ * The format of this string is implementation-dependent. Implementations must ensure + * that clients cannot guess tokens for existing binaries. + * + * @return A unique token identifying this upload. */ @NotNull String getUploadToken();