Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
Jena 3.11.0, Jena 3.12.0
-
None
-
Linux 3.16.0-9-amd64 #1 SMP Debian 3.16.68-2 (2019-06-17) x86_64 GNU/Linux
java version "1.8.0_05"
Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)
Description
Issue: Inserting triplets, then rollbacking the TDB2 dataset, and loading back nodes, including some nodes again with the same content causes some artifacts and mess: some nodes disappear, some nodes are replaced. Moreover it unrecoverably corrupts the database files: accessing triplets then may cause RiotThriftException.
**org.apache.jena.riot.thrift.RiotThriftException: No conversion to a
Node: <RDF_Term >
Reproduction: Create some quads into a non-empty dataset, then rollback it, and create again the same triplets in another order, using anonymous and URL nodes simultaneously. Although this method does not guarantee the issue, the possibility is high.
Cause: My inverstigation shows, that the culprit is the NodeTableCache. It caches the node - nodeId relation of the backed table (NodeTableTRDF), but the cache does not react to the rollback (abort) operation. The backing table - during rollback - invalidates the node Id-s. The node Id is in close relation of the position of the node data in the node data file, so new inserts can reuse these invalidated node Ids, or close to it for other nodes. As the nodes (remaining in cache, but not written, and the new ones) then overlaps each other, reading back them causes Thrift errors, or later it causes missing nodes in the index. The data of the cached nodes disappears, if they fall out from the cache, or the dataset reopens.
Possible fix: None of the NodeTables registers and reacts to the rollback, only the backing file and index are restored. Best possible solution is creating an option for these components to react to the restoration. Cache then may evict cached data, or may track changes in transactions, and can evict only those. Anyway it is very justifiable for the rollback situations to evict all the caches.
TransactionCoordinator has collections for shutdownHooks, and for transactionsComponents. This is a good pattern for creating another collection for notification interfaces, and calling back these on transactional events. CacheNodeTable (and other objects) can then be a listener to this events, and may evict the cache, if necessary.
Other possibility to create callback option in the NodeTable to react to the invalidation, and propagate back the invalidation in the NodeTable hierarchy.
Another simpler fix is to propagate down the thread-safe storage "version" in the NodeTables, and check it in the cache, and evict.
Workaround: Skipping the cache (setting nodeToIdCacheSize and idToNodeCacheSize to -1 in StoreParams) is a good workaround now, but causes performance issues.
Attachments
Attachments
Issue Links
- links to