Lucene - Core
  1. Lucene - Core
  2. LUCENE-4061

Improvements to DirectoryTaxonomyWriter (synchronization and others)

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA, 6.0
    • Component/s: modules/facet
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      DirTaxoWriter synchronizes in too many places. For instance addCategory() is fully synchronized, while only a small part of it needs to be.

      Additionally, getCacheMemoryUsage looks bogus - it depends on the type of the TaxoWriterCache. No code uses it, so I'd like to remove it – whoever is interested can query the specific cache impl it has. Currently, only Cl2oTaxoWriterCache supports it.

      If the changes will be simple, I'll port them to 3.6.1 as well.

      1. LUCENE-4061.patch
        34 kB
        Shai Erera
      2. LUCENE-4061.patch
        25 kB
        Shai Erera

        Activity

        Hide
        Shai Erera added a comment -

        Patch does the following:

        • Removes getCacheMemoryUsage()
        • Handles few TODOs (e.g. don't use MultiFields)
        • Improves synchronization of addCategory – only if the category is not found, then it synchronizes and proceeds with the original logic.
        • Made addCategoryDocument private – this is a very dangerous method to call by extending classes.
        • Made the TaxoWriterCache impls thread-safe.
        • Added a concurrency test to validate synchronization works
        • Modified TestAddTaxonomy to run much faster by building the taxonomy in parallel.

        All tests pass, and run even faster now. I plan to commit this soon.

        Show
        Shai Erera added a comment - Patch does the following: Removes getCacheMemoryUsage() Handles few TODOs (e.g. don't use MultiFields) Improves synchronization of addCategory – only if the category is not found, then it synchronizes and proceeds with the original logic. Made addCategoryDocument private – this is a very dangerous method to call by extending classes. Made the TaxoWriterCache impls thread-safe. Added a concurrency test to validate synchronization works Modified TestAddTaxonomy to run much faster by building the taxonomy in parallel. All tests pass, and run even faster now. I plan to commit this soon.
        Hide
        Shai Erera added a comment -

        Committed rev 1339150

        Show
        Shai Erera added a comment - Committed rev 1339150
        Hide
        Shai Erera added a comment -

        TestDirectoryTaxonomyWriter.testConcurrency sometimes fails on mismatch number of categories.

        This is not repeated easily, but if run with -Dtests.iters, it is.

        Show
        Shai Erera added a comment - TestDirectoryTaxonomyWriter.testConcurrency sometimes fails on mismatch number of categories. This is not repeated easily, but if run with -Dtests.iters, it is.
        Hide
        Shai Erera added a comment -

        Fixing the concurrency issue was hairy, and required lots of changes to DirTaxoWriter:

        • Needed a ReaderManager, so added such in core under o.a.l.index. Separately, I think that we should move RefManager to o.a.l.util instead of o.a.l.search.
        • DirTaxoWriter was not very well built for concurrency , so many changes had to be done to it.
        • TaxoWriterCache.hasRoom(int) has been replaced by isFull().
        • TestDirTaxoWriter has been enhanced to sometimes, during nightly builds, use a NoOpCache, as it uncovered some bugs too ! (yet it makes the test horribly slow, hence why the nightly criteria, and very low chances still).

        I ran DirTaxoWriter.testConcurrency over 1000 times and no failures, so I'm inclined to believe the concurrency issues are now resolved. Still, a second (and third and even a forth) look by someone else would be appreciated.

        I'll commit it tomorrow if no one will object, and port to 4x.

        Show
        Shai Erera added a comment - Fixing the concurrency issue was hairy, and required lots of changes to DirTaxoWriter: Needed a ReaderManager, so added such in core under o.a.l.index. Separately, I think that we should move RefManager to o.a.l.util instead of o.a.l.search. DirTaxoWriter was not very well built for concurrency , so many changes had to be done to it. TaxoWriterCache.hasRoom(int) has been replaced by isFull(). TestDirTaxoWriter has been enhanced to sometimes, during nightly builds, use a NoOpCache, as it uncovered some bugs too ! (yet it makes the test horribly slow, hence why the nightly criteria, and very low chances still). I ran DirTaxoWriter.testConcurrency over 1000 times and no failures, so I'm inclined to believe the concurrency issues are now resolved. Still, a second (and third and even a forth) look by someone else would be appreciated. I'll commit it tomorrow if no one will object, and port to 4x.
        Hide
        Simon Willnauer added a comment -

        patch looks good. I wonder if you can't create the ReaderManager in advance and make it final. I mean if you do add categories which seems to be the purpose of that writer you need it anyway and the costs should be considerably low. That would remove the need for locking on it entirely.

        Show
        Simon Willnauer added a comment - patch looks good. I wonder if you can't create the ReaderManager in advance and make it final. I mean if you do add categories which seems to be the purpose of that writer you need it anyway and the costs should be considerably low. That would remove the need for locking on it entirely.
        Hide
        Shai Erera added a comment -

        Believe me, I wanted to avoid it too, but ReaderManager is allocated like that for few reasons:

        • It's lazy, as the comment in the code says – it's a waste to open an IR if your DirTaxoWriter session is going to be short living.
          • Personally, I think this is a minor issue, and if it were only that, I'd make it final.
        • The TaxoWriterCache can be 'complete' which means all the categories currently known to DirTW are cached. In that case, it is a waste to keep the reader open and we close it.
          • This is true for Cl2oCache, since it keeps all categories in memory.
          • But LruCache is not like that, since it potentially evicts entries from the cache. So it can be 'complete' until it evicts the first entry, in which case it will never be complete, and we'll need to keep the reader open.

        Currently, when we don't need ReaderManager, we close it. We also don't open it until few cache misses occur. To change it would mean to sacrifice efficiency by always keeping a Reader open, even if it's not needed. It wastes RAM, file handles and what not.

        Not sure it's worth it. What do you think?

        Show
        Shai Erera added a comment - Believe me, I wanted to avoid it too, but ReaderManager is allocated like that for few reasons: It's lazy, as the comment in the code says – it's a waste to open an IR if your DirTaxoWriter session is going to be short living. Personally, I think this is a minor issue, and if it were only that, I'd make it final. The TaxoWriterCache can be 'complete' which means all the categories currently known to DirTW are cached. In that case, it is a waste to keep the reader open and we close it. This is true for Cl2oCache, since it keeps all categories in memory. But LruCache is not like that, since it potentially evicts entries from the cache. So it can be 'complete' until it evicts the first entry, in which case it will never be complete, and we'll need to keep the reader open. Currently, when we don't need ReaderManager, we close it. We also don't open it until few cache misses occur. To change it would mean to sacrifice efficiency by always keeping a Reader open, even if it's not needed. It wastes RAM, file handles and what not. Not sure it's worth it. What do you think?
        Hide
        Simon Willnauer added a comment -

        Not sure it's worth it. What do you think?

        given that I think its ok to have the locking and init it lazy. +1 for the patch. go for it!

        Show
        Simon Willnauer added a comment - Not sure it's worth it. What do you think? given that I think its ok to have the locking and init it lazy. +1 for the patch. go for it!
        Hide
        Shai Erera added a comment -

        Thanks Simon for reviewing the patch. I'll commit this then and port to 4x.

        Show
        Shai Erera added a comment - Thanks Simon for reviewing the patch. I'll commit this then and port to 4x.
        Hide
        Shai Erera added a comment -

        Committed rev 1349214 (trunk) and 1349223 (4x).

        Show
        Shai Erera added a comment - Committed rev 1349214 (trunk) and 1349223 (4x).
        Hide
        Shai Erera added a comment -

        Jenkins failed with this stack trace:

        Stack Trace:
        java.lang.AssertionError: mismatch number of categories
        expected:<21684> but was:<22301>
        at __randomizedtesting.SeedInfo.seed([FA7ED469F531418D:5C8E16F49462A144]:0)
        at org.junit.Assert.fail(Assert.java:93)
        at org.junit.Assert.failNotEquals(Assert.java:647)
        at org.junit.Assert.assertEquals(Assert.java:128)
        at org.junit.Assert.assertEquals(Assert.java:472)
        at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter.testConcurrency(TestDirectoryTaxonomyWriter.java:263)

        reproduce with: ant test -Dtestcase=TestDirectoryTaxonomyWriter -Dtests.method=testConcurrency -Dtests.seed=FA7ED469F531418D -Dtests.multiplier=3
        -Dtests.nightly=true -Dtests.linedocsfile=/home/hudson/lucene-data/enwiki.random.lines.txt -Dtests.locale=sk_SK -Dtests.timezone=Africa/Nairobi
        -Dargs="-Dfile.encoding=ISO8859-1"

        I still wasn't able to reproduce it on my machine. Digging into this.

        Show
        Shai Erera added a comment - Jenkins failed with this stack trace: Stack Trace: java.lang.AssertionError: mismatch number of categories expected:<21684> but was:<22301> at __randomizedtesting.SeedInfo.seed( [FA7ED469F531418D:5C8E16F49462A144] :0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter.testConcurrency(TestDirectoryTaxonomyWriter.java:263) reproduce with: ant test -Dtestcase=TestDirectoryTaxonomyWriter -Dtests.method=testConcurrency -Dtests.seed=FA7ED469F531418D -Dtests.multiplier=3 -Dtests.nightly=true -Dtests.linedocsfile=/home/hudson/lucene-data/enwiki.random.lines.txt -Dtests.locale=sk_SK -Dtests.timezone=Africa/Nairobi -Dargs="-Dfile.encoding=ISO8859-1" I still wasn't able to reproduce it on my machine. Digging into this.
        Hide
        Shai Erera added a comment -

        Some progress ... still wasn't able to reproduce locally, but I did realize that the test was run with -Dtests.nightly, and with the seed it uses NoOpCache (which never caches anything). Progress is that I thought it's related to LruTaxoWriterCache all along ! .

        Still digging.

        Show
        Shai Erera added a comment - Some progress ... still wasn't able to reproduce locally, but I did realize that the test was run with -Dtests.nightly, and with the seed it uses NoOpCache (which never caches anything). Progress is that I thought it's related to LruTaxoWriterCache all along ! . Still digging.
        Hide
        Shai Erera added a comment -

        Found it. There was a problem with instructions order – upon adding a new category, we mark that the reader should be refreshed after we add the category to the cache. When NoOpCache is used, adding to the cache fails, and we fail to refresh the reader because the flag is set to true only afterwards. I added a simple deterministic test that verifies it. Will commit the fix shortly.

        BTW, NoOpCache in this case simulates the case where e.g. LruTaxoWriterCache just evicts entries from it and we fail to refresh the reader, so this is a true bug and not related to the use of NoOpCache.

        Hopefully this will be the end of it .

        Show
        Shai Erera added a comment - Found it. There was a problem with instructions order – upon adding a new category, we mark that the reader should be refreshed after we add the category to the cache. When NoOpCache is used, adding to the cache fails, and we fail to refresh the reader because the flag is set to true only afterwards. I added a simple deterministic test that verifies it. Will commit the fix shortly. BTW, NoOpCache in this case simulates the case where e.g. LruTaxoWriterCache just evicts entries from it and we fail to refresh the reader, so this is a true bug and not related to the use of NoOpCache. Hopefully this will be the end of it .
        Hide
        Shai Erera added a comment -

        Committed revisions 1351672 and 1351675.

        Show
        Shai Erera added a comment - Committed revisions 1351672 and 1351675.

          People

          • Assignee:
            Shai Erera
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development