Apache Jena
  1. Apache Jena
  2. JENA-76

Resourceutils.renameResource uses Iterator.remove() to make changes - not all iterators support .remove.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Jena
    • Labels:
      None

      Description

      Resourceutils.renameResource uses Iterator.remove() to make changes. TDB does not support Iterator.remove

      Instead, renameResource could grab a block (say, 1000) items, rename them and loop on blocks of 1000. This way, the iterator is terminated before updates are done. The nature of the rename operation means that the find iterator is executed until no resources are found so this batching mechanism does not need to need to track what has and has not been found - just repeat until less than 1000 items found.

      1. JENA-76-r1185845.patch
        5 kB
        Stephen Allen
      2. JENA-76-r1185803.patch
        5 kB
        Stephen Allen

        Activity

        Hide
        Thorsten Möller added a comment -

        Shouldn't this be a feature request (rather than a bug) having the subject "TDB should implement Iterator.remove()"? What is the reason for not considering this an option? Is it conceptually related that TDB cannot implement Iterator.remove()? The solution described above seems to introduce additional work (search repeatedly as long as something relevant is found versus search once only).

        Show
        Thorsten Möller added a comment - Shouldn't this be a feature request (rather than a bug) having the subject "TDB should implement Iterator.remove()"? What is the reason for not considering this an option? Is it conceptually related that TDB cannot implement Iterator.remove()? The solution described above seems to introduce additional work (search repeatedly as long as something relevant is found versus search once only).
        Hide
        Andy Seaborne added a comment -

        There is no requirement to implement Iterator.remove therefore the renameResource is making an assumption that can't be relied on.

        To implement Iterartor.remove in TDB is complex due to the fact there isn't a triple table - only indexes that need co-ordination. An iterator is only calculated over one index.

        The solution is unlikely to introduce noticable additional work for TDB due to cache effects. Also, when only a few statements are changed, it's only one block. A delayed change solution may need spill-to-disk at scale. (See JENA-45)

        The effects of requiring Iterator.remove on SDB are worse.

        Show
        Andy Seaborne added a comment - There is no requirement to implement Iterator.remove therefore the renameResource is making an assumption that can't be relied on. To implement Iterartor.remove in TDB is complex due to the fact there isn't a triple table - only indexes that need co-ordination. An iterator is only calculated over one index. The solution is unlikely to introduce noticable additional work for TDB due to cache effects. Also, when only a few statements are changed, it's only one block. A delayed change solution may need spill-to-disk at scale. (See JENA-45 ) The effects of requiring Iterator.remove on SDB are worse.
        Hide
        Thorsten Möller added a comment -

        I see (seems similar to the notorious view update problem). Still, the approach is not quite convincing since it creates behavior that one might want/need to optimize at a later point. If I understand the proposal correctly, then it is necessary to temporarily keep the blocks in main memory (i.e., it is not a streaming like approach). Hence, under main memory constraints one might want to keep the blocks rather small, ergo more repetitions. Conversely, under performance constraints one might want to make the blocks large in order to reduce repetitions.

        Show
        Thorsten Möller added a comment - I see (seems similar to the notorious view update problem). Still, the approach is not quite convincing since it creates behavior that one might want/need to optimize at a later point. If I understand the proposal correctly, then it is necessary to temporarily keep the blocks in main memory (i.e., it is not a streaming like approach). Hence, under main memory constraints one might want to keep the blocks rather small, ergo more repetitions. Conversely, under performance constraints one might want to make the blocks large in order to reduce repetitions.
        Hide
        Stephen Allen added a comment -

        Attached a patch that fixes this bug.

        It uses a block of 1000 triples in memory instead of a DataBag. This keeps things simpler as Andy mentioned.

        Show
        Stephen Allen added a comment - Attached a patch that fixes this bug. It uses a block of 1000 triples in memory instead of a DataBag. This keeps things simpler as Andy mentioned.
        Hide
        Stephen Allen added a comment -

        Improved the patch a bit by removing the need for storing two lists.

        Show
        Stephen Allen added a comment - Improved the patch a bit by removing the need for storing two lists.
        Hide
        Stephen Allen added a comment -

        Also alternatively you could replace the interleaved delete/add triple bit of the patch with something that might be more efficient (or use the BulkUpdater):

        for ( Triple t : triples )

        { rawGraph.delete(t) ; }

        for ( Triple t : triples )

        { Node oldS = t.getSubject(), oldO = t.getObject() ; Node newS = oldS.equals(resAsNode) ? newResAsNode : oldS ; Node newO = oldO.equals(resAsNode) ? newResAsNode : oldO ; rawGraph.add(Triple.create(newS, t.getPredicate(), newO)); }

        triples.clear();

        Show
        Stephen Allen added a comment - Also alternatively you could replace the interleaved delete/add triple bit of the patch with something that might be more efficient (or use the BulkUpdater): for ( Triple t : triples ) { rawGraph.delete(t) ; } for ( Triple t : triples ) { Node oldS = t.getSubject(), oldO = t.getObject() ; Node newS = oldS.equals(resAsNode) ? newResAsNode : oldS ; Node newO = oldO.equals(resAsNode) ? newResAsNode : oldO ; rawGraph.add(Triple.create(newS, t.getPredicate(), newO)); } triples.clear();
        Hide
        Stephen Allen added a comment -

        Fixed in revision 1190400.

        Show
        Stephen Allen added a comment - Fixed in revision 1190400.

          People

          • Assignee:
            Stephen Allen
            Reporter:
            Andy Seaborne
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development