Lucene - Core
  1. Lucene - Core
  2. LUCENE-3596

DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: modules/facet
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Current protected openIndexWriter(Directory directory, OpenMode openMode) does not provide access to the IWC it creates.
      So extensions must reimplement this method completely in order to set e.f. info stream for the internal index writer.
      This came up in user question: Taxonomy indexer debug

      1. LUCENE-3596.patch
        6 kB
        Doron Cohen
      2. LUCENE-3596.patch
        2 kB
        Doron Cohen

        Activity

        Hide
        Shai Erera added a comment -

        We should be careful about this though. DirTW requires LogMergePolicy and won't work well with TieredMP, or any other out-of-order-merges MP. We should clearly note that in the javadocs of both openIndexWriter, and getIWC (if you intend to add it).

        Other than that, I wonder why was infoStream moved to IWC and made final in IW. But I will discuss that on the dev@ list.

        Show
        Shai Erera added a comment - We should be careful about this though. DirTW requires LogMergePolicy and won't work well with TieredMP, or any other out-of-order-merges MP. We should clearly note that in the javadocs of both openIndexWriter, and getIWC (if you intend to add it). Other than that, I wonder why was infoStream moved to IWC and made final in IW. But I will discuss that on the dev@ list.
        Hide
        Doron Cohen added a comment -

        and getIWC (if you intend to add it).

        Yes that's what I would like to add.
        These docs are missing then anyhow, with or without getIWC().
        This added extendability is useful although behavior regarding info-stream differs between trunk and 3x - i.e. that in 3x one can set that stream also with current extension point.

        Show
        Doron Cohen added a comment - and getIWC (if you intend to add it). Yes that's what I would like to add. These docs are missing then anyhow, with or without getIWC(). This added extendability is useful although behavior regarding info-stream differs between trunk and 3x - i.e. that in 3x one can set that stream also with current extension point.
        Hide
        Doron Cohen added a comment -

        patch adds the method createIndexWriterConfig(OpenMode openMode) and javadocs for in-order segments merging.

        Show
        Doron Cohen added a comment - patch adds the method createIndexWriterConfig(OpenMode openMode) and javadocs for in-order segments merging.
        Hide
        Doron Cohen added a comment -

        Also, there seems to be a bug in current taxonomy writer test - TestIndexClose - where the IndexWriterConfig's merge policy might allow to merge segments out-of-order. That test calls LTC.newIndexWriterConfig() and it is just by luck that this test have not failed so far.

        This is a bad type of failure for an application (is there ever a good type?), because by the time the bug is exposed it would show as a wrong facet returned in faceted search, and go figure that late that this is because at an earlier time an index writer was created which allowed out-of-order merging...

        Therefore, it would have been useful if, in addition to the javadocs about requiring type of merge policy, we would also throw an exception (IllegalArgument or IO) if the IWC's merge policy allows merging out-of-order. This should be checked in two locations:

        • createIWC() returns
        • openIndex() returns, by examining the IWC of the index

        The second check is more involved as it is done after the index was already opened, so it must be closed prior to throwing that exception.

        However, merge-policy does not have in its "contract" anything like Collector.acceptsDocsOutOfOrder(), so it is not possible to verify this at all.

        Adding such a method to MergePolicy seems to me an over-kill, for this particular case, unless there is additional interest in such a declaration?

        Otherwise, it is possible to require that the merge policy must be a descendant of LogMergePolicy. This on the other hand would not allow to test this class with other order-preserving policies, such as NoMerge.

        So I am not sure what is the best way to proceed in this regard.

        I think there are two options actually:

        1. just javadoc that fact, and fix the test to always create an order preserving MP.
        2. add that declaration to MP.

        Unless there are opinions favoring the second option I'll go with the first one.

        In addition, (this is true for both options) I will move the call to createIWC into the constructor and modify openIndex signature to accept an IWC instead of the open mode, as it seems wrong - API wise - that one extension point (createIWC) is invoked by another extension point (openIndex) - better have them both be invoked from the constructor, making it harder for someone to, by mistake, totally ignore in createIndex() the value returned by createIWC().

        Show
        Doron Cohen added a comment - Also, there seems to be a bug in current taxonomy writer test - TestIndexClose - where the IndexWriterConfig's merge policy might allow to merge segments out-of-order. That test calls LTC.newIndexWriterConfig() and it is just by luck that this test have not failed so far. This is a bad type of failure for an application (is there ever a good type? ), because by the time the bug is exposed it would show as a wrong facet returned in faceted search, and go figure that late that this is because at an earlier time an index writer was created which allowed out-of-order merging... Therefore, it would have been useful if, in addition to the javadocs about requiring type of merge policy, we would also throw an exception (IllegalArgument or IO) if the IWC's merge policy allows merging out-of-order. This should be checked in two locations: createIWC() returns openIndex() returns, by examining the IWC of the index The second check is more involved as it is done after the index was already opened, so it must be closed prior to throwing that exception. However, merge-policy does not have in its "contract" anything like Collector.acceptsDocsOutOfOrder(), so it is not possible to verify this at all. Adding such a method to MergePolicy seems to me an over-kill, for this particular case, unless there is additional interest in such a declaration? Otherwise, it is possible to require that the merge policy must be a descendant of LogMergePolicy. This on the other hand would not allow to test this class with other order-preserving policies, such as NoMerge. So I am not sure what is the best way to proceed in this regard. I think there are two options actually: just javadoc that fact, and fix the test to always create an order preserving MP. add that declaration to MP. Unless there are opinions favoring the second option I'll go with the first one. In addition, (this is true for both options) I will move the call to createIWC into the constructor and modify openIndex signature to accept an IWC instead of the open mode, as it seems wrong - API wise - that one extension point (createIWC) is invoked by another extension point (openIndex) - better have them both be invoked from the constructor, making it harder for someone to, by mistake, totally ignore in createIndex() the value returned by createIWC().
        Hide
        Shai Erera added a comment -

        I will move the call to createIWC into the constructor and modify openIndex signature to accept an IWC instead of the open mode

        +1. This makes sense.

        just javadoc that fact, and fix the test to always create an order preserving MP.

        I agree. The options are not very appealing and we might work too hard to prevent something that will never happen. The root cause of this issue was the inability to turn on IW's infoStream (which IMO should be changed to a proper logging framework). Besides our tests, I don't think many will override createIWC / openIndexWriter for changing other attributes.

        Show
        Shai Erera added a comment - I will move the call to createIWC into the constructor and modify openIndex signature to accept an IWC instead of the open mode +1. This makes sense. just javadoc that fact, and fix the test to always create an order preserving MP. I agree. The options are not very appealing and we might work too hard to prevent something that will never happen. The root cause of this issue was the inability to turn on IW's infoStream (which IMO should be changed to a proper logging framework). Besides our tests, I don't think many will override createIWC / openIndexWriter for changing other attributes.
        Hide
        Doron Cohen added a comment -

        Patch taking approach (1) above, and moving createIWC() to constructor.

        In addition fixed some javadoc comments, and added an assert to the constructor, which, only when assertions are enabled, will verify that the IWC in effect is not an instance of TieredMergePolicy. Imperfect as this is, it at least exposed the problem in current test (fixed to use newLogMP()).

        I think this is ready to commit.

        Show
        Doron Cohen added a comment - Patch taking approach (1) above, and moving createIWC() to constructor. In addition fixed some javadoc comments, and added an assert to the constructor, which, only when assertions are enabled, will verify that the IWC in effect is not an instance of TieredMergePolicy. Imperfect as this is, it at least exposed the problem in current test (fixed to use newLogMP()). I think this is ready to commit.
        Hide
        Shai Erera added a comment -

        Patch looks good. +1 to commit.

        Show
        Shai Erera added a comment - Patch looks good. +1 to commit.
        Hide
        Doron Cohen added a comment -

        Thanks for reviewing Shai.

        Committed:

        • r1206996 - trunk
        • r1207008 - 3x

        CHANGES.txt entry is only in 3x because in trunk facet is under modules. I don't like this difference, but there it is.

        Show
        Doron Cohen added a comment - Thanks for reviewing Shai. Committed: r1206996 - trunk r1207008 - 3x CHANGES.txt entry is only in 3x because in trunk facet is under modules. I don't like this difference, but there it is.

          People

          • Assignee:
            Doron Cohen
            Reporter:
            Doron Cohen
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development