[OAK-1392] SegmentBlob.equals() optimization - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.17
Component/s: core
Labels:
None

Description

The current SegmentBlob.equals() method only checks for reference equality before falling back to the AbstractBlob.equals() method that just scans the entire byte stream.

This works well for the majority of cases where a binary won't change at all or at least not often. However, there are some cases where a client frequently updates a binary or even rewrites it with the exact same contents. We should optimize the handling of also those cases.

Some ideas on different things we can/should do:

Make AbstractBlob.equals() compare the blob lengths before scanning the byte streams. If a blob has changed it's length is likely also different, in which case the length check should provide a quick shortcut.
Keep a simple checksum like Adler-32 along with medium-sized value records and the block record references of a large value record. Compare those checksums before falling back to a full byte scan. This should capture practically all cases where the binaries are different even with equal lengths, but still not the case where they're equal.
When updating a binary value, do an equality check with the previous value and reuse the previous value if equal. The extra cost of doing this should get recovered already when the commit hooks that look at the change won't have to consider an unchanged binary.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

OAK-1392-v0.patch
07/Feb/14 10:50
6 kB
Alex Deparvu
0001-OAK-1392-SegmentBlob.equals-optimization.patch
11/Feb/14 10:40
5 kB
Jukka Zitting

Activity

People

Assignee:: Unassigned

Reporter:: Jukka Zitting

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 05/Feb/14 18:27

Updated:: 27/Feb/14 09:28

Resolved:: 18/Feb/14 14:05