Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
1.7.6
Description
Due to changes done in OAK-4969, currently there are two 'sync blob' cycles triggered by StandbyDiff#childNodeChanged. The test scenario is the same as the one in DataStoreTestBase#testSyncBigBlob: on the primary file store, a new big blob (1GB) is added and then a standby sync is triggered to sync this content to the secondary file store.
The first 'sync blob' cycle happens as a result of #process being called in StandbyDiff#childNodeChanged. Therefore, a new 'get blob' request is created on the client and the server starts sending chunks from the big blob. Now, if the time needed for transferring the entire blob from server to client exceeds readTimeoutMs an IllegalStateException will be correctly thrown by StandbyDiff#readBlob, but will be swallowed by the StandbyDiff#childNodeChanged in its catch clause. A second 'sync blob' cycle will be triggered and, this might succeed with the same readTimeoutMs for which it was failing before, if readTimeoutMs * 2 is enough, the blob will be synced on the standby. This happens because the server will continue sending the remaining chunks after IllegalStateException was thrown (first 'sync blob' cycle).
The consequence of these two 'sync blob' cycles is that sometimes, deleting the temporary file to which chunks are spooled to on the client fails (see Windows for example and OAK-6641 specifically). This way, instead of deleting the previous incomplete transfer, new chunks from the second 'sync blob' cycle are added. The blob persisted in the blob store on the client won't have the same size and id as the initial blob sent by the server.