[OAK-6659] Cold standby should fail loudly when a big blob can't be timely transferred - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.7.6
Fix Version/s: 1.7.8, 1.8.0
Component/s: segment-tar, tarmk-standby
Labels:
- cold-standby

Description

Due to changes done in ~~OAK-4969~~, currently there are two 'sync blob' cycles triggered by StandbyDiff#childNodeChanged. The test scenario is the same as the one in DataStoreTestBase#testSyncBigBlob: on the primary file store, a new big blob (1GB) is added and then a standby sync is triggered to sync this content to the secondary file store.

The first 'sync blob' cycle happens as a result of #process being called in StandbyDiff#childNodeChanged. Therefore, a new 'get blob' request is created on the client and the server starts sending chunks from the big blob. Now, if the time needed for transferring the entire blob from server to client exceeds readTimeoutMs an IllegalStateException will be correctly thrown by StandbyDiff#readBlob, but will be swallowed by the StandbyDiff#childNodeChanged in its catch clause. A second 'sync blob' cycle will be triggered and, ~~this might succeed with the same readTimeoutMs for which it was failing before~~, if readTimeoutMs * 2 is enough, the blob will be synced on the standby. This happens because the server will continue sending the remaining chunks after IllegalStateException was thrown (first 'sync blob' cycle).

The consequence of these two 'sync blob' cycles is that sometimes, deleting the temporary file to which chunks are spooled to on the client fails (see Windows for example and ~~OAK-6641~~ specifically). This way, instead of deleting the previous incomplete transfer, new chunks from the second 'sync blob' cycle are added. The blob persisted in the blob store on the client won't have the same size and id as the initial blob sent by the server.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

OAK-6659.patch
14/Sep/17 11:47
8 kB
Andrei Dulceanu

Issue Links

blocks

OAK-6641 test failure in org.apache.jackrabbit.oak.segment.standby.ExternalPrivateStoreIT

Closed

relates to

OAK-6661 ResponseDecoder should check that the length of the received blob matches the length of the sent blob

Closed

Activity

People

Assignee:: Andrei Dulceanu

Reporter:: Andrei Dulceanu

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 13/Sep/17 12:12

Updated:: 04/Oct/19 17:52

Resolved:: 14/Sep/17 15:15