Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-1746

TDB2 rollback method clashes with nodetable cache

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • Jena 3.11.0, Jena 3.12.0
    • Jena 3.13.0
    • TDB2
    • None
    • Linux  3.16.0-9-amd64 #1 SMP Debian 3.16.68-2 (2019-06-17) x86_64 GNU/Linux

      java version "1.8.0_05"
      Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
      Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)

    Description

      Issue: Inserting triplets, then rollbacking the TDB2 dataset, and loading back nodes, including some nodes again with the same content causes some artifacts and mess: some nodes disappear, some nodes are replaced. Moreover it unrecoverably corrupts the database files: accessing triplets then may cause RiotThriftException.

      **org.apache.jena.riot.thrift.RiotThriftException: No conversion to a
      Node: <RDF_Term >

      Reproduction: Create some quads into a non-empty dataset, then rollback it, and create again the same triplets in another order, using anonymous and URL nodes  simultaneously. Although this method does not guarantee the issue, the possibility is high. 

      Cause: My inverstigation shows, that the culprit is the NodeTableCache. It caches the node - nodeId relation of the backed table (NodeTableTRDF), but the cache does not react to the rollback (abort) operation. The backing table - during rollback - invalidates the  node Id-s. The node Id is in close relation of the position of the node data in the node data file, so new inserts can reuse these invalidated node Ids, or close to it for other nodes. As the nodes (remaining in cache, but not written, and the new ones) then overlaps each other, reading  back them causes Thrift errors, or later it causes missing nodes in the index. The data of the cached nodes disappears, if they fall out from the cache, or the dataset reopens.

      Possible fix: None of the NodeTables registers and reacts to the rollback,  only the backing file and index are restored. Best possible solution is creating an option for these components to react to the restoration. Cache then may evict cached data, or may track changes in transactions, and can evict only those. Anyway it is very justifiable for the rollback situations to evict all the caches.
      TransactionCoordinator has collections for shutdownHooks, and for transactionsComponents. This is a good pattern for creating another collection for notification interfaces, and calling back these on transactional events. CacheNodeTable (and other objects) can then be a listener to this events, and may evict the cache, if necessary.

      Other possibility to create callback option in the NodeTable to react to the invalidation, and propagate back  the invalidation in the NodeTable hierarchy. 

      Another simpler fix is to propagate down the thread-safe storage "version" in the NodeTables, and check it in the cache, and evict.

      Workaround: Skipping the cache (setting nodeToIdCacheSize and idToNodeCacheSize to -1 in StoreParams) is a good workaround now, but causes performance issues.

       

      Attachments

        1. jena-test.tgz
          53 kB
          Miklós Győrfi

        Issue Links

          Activity

            People

              andy Andy Seaborne
              gyorfimi Miklós Győrfi
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m