Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-1266 DocumentStore implementation for relational databases
  3. OAK-8977

RDBBlobStore performance improvement for SQL Server (MSSQL)

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Technical task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • rdbmk
    • None


      The performance test done in OAK-8926 indicates that one or more aspects of using SQL Server as the database need to be addressed (also MySQL, but this issue only concerns SQL Server).


      OAK-8963 (specify a binary collation in the CREATE TABLE statement) will improve performance a little, but not nearly enough to get its performance up to the range of the other non-MySQL databases.

      I did some testing and found one area that can be improved, and that should be feasible to improve. The current DB access pattern is to submit parameterized dynamic SQL. JDBC handles this as a "prepared" statement. I did some research four years ago on prepared statements ( What is the sense and benefit of using SqlCommand.Prepare()? on DBA.StackExchange ) and found that:

      1. prepared statements, even when parameterized, don't cache the execution plan between connections (even if the connection is pooled and does not need to re-authenticate)
      2. prepared statements only re-use the execution plan when executing the same prepared statement with different parameters in the same session (the Java code opens and closes the connection per each call)

      One way to improve performance is to:

      1. put the T-SQL code into a stored procedure (that way SQL Server does not need to parse it per each call and can re-use an execution plan), and
      2. call the stored procedure using a "callable statement": Using a stored procedure with input parameters

      I tested this by first copying the basic sequence of T-SQL statements found in the storeBlockInDatabase method (assuming that everything always worked / happy path / no error handling) and simulating the new testInsertSmallBlobs() test, first as dynamic SQL, and then as a stored procedure:

      ALTER PROCEDURE dbo.[StoreBlockInDatabase]
      	@Digest VARBINARY(32),
      	@Level INT,
      	@Data VARBINARY(MAX)
      DECLARE @Id VARCHAR(64) = CONVERT(VARCHAR(64), @Digest, 2),
      		@Now BIGINT;
      SELECT @Now = [cpu_ticks]
      FROM   sys.dm_os_sys_info;
      	SET    [LASTMOD] = @Now
      	WHERE  [ID] = @Id;
      	IF (@@ROWCOUNT = 0)
      		INSERT INTO dbo.[DATASTORE_DATA] ([ID], [DATA]) VALUES (@Id, @Data);
      		INSERT INTO dbo.[DATASTORE_META] ([ID], [LVL], [LASTMOD]) VALUES (@Id, @Level, @Now);
      END TRY
      END CATCH;

      I tested against both the Latin1_General_CI_AS collation and the Latin1_General_BIN2 collation:

      • The stored procedure inserts 2.5 times more rows than the dynamic SQL version
      • The binary collation is slightly better than the non-binary collation per each test

      I will attach the test script once I get it cleaned up.

      P.S. I also tested with another pattern to see if doing a SELECT first (rather than assuming the UPDATE) would be better (based on my testing for a similar case: UPDATE performance where no data changes on DBA.StackExchange), but it wasn't better.

      P.P.S. I figure the CREATE PROCEDURE statement could be placed in the vendor-specific area of RDBBlobStoreDB


        Issue Links


          This comment will be Viewable by All Users Viewable by All Users


            Unassigned Unassigned
            Solomon.Rutzky Solomon Rutzky




                Issue deployment