Lucene - Core
  1. Lucene - Core
  2. LUCENE-1401

Deprecation of autoCommit in 2.4 leads to compile problems, when autoCommit should be false

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: 2.4, 2.9
    • Fix Version/s: 2.4, 2.9
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I am currently changing my code to be most compatible with 2.4. I switched on deprecation warnings and got a warning about the autoCommit parameter in IndexWriter constructors.

      My code should use autoCommit=false, so I want to use the new semantics. The default of IndexWriter is still autoCommit=true. My problem now: How to disable autoCommit whithout deprecation warnings?

      Maybe, the "old" constructors, that are deprecated should use autoCommit=true. But there are new constructors with this "IndexWriter.MaxFieldLength mfl" in it, that appear new in 2.4 but are deprecated:

      IndexWriter(Directory d, boolean autoCommit, Analyzer a, boolean create, IndexDeletionPolicy deletionPolicy, IndexWriter.MaxFieldLength mfl)
      Deprecated. This will be removed in 3.0, when autoCommit will be hardwired to false. Use IndexWriter(Directory,Analyzer,boolean,IndexDeletionPolicy,MaxFieldLength) instead, and call commit() when needed.

      What the hell is meant by this, a new constructor that is deprecated? And the hint is wrong. If I use the other constructor in the warning, I get autoCommit=true.

      There is something completely wrong.

      It should be clear, which constructors set autoCommit=true, which set it per default to false (perhaps new ones), and the Deprecated text is wrong, if autoCommit does not default to false.

      1. LUCENE-1401.patch
        73 kB
        Michael McCandless

        Activity

        Uwe Schindler created issue -
        Hide
        Michael McCandless added a comment -

        Achieving the migration from autoCommit=true to autoCommit=false is somewhat tricky. As things stand now, all ctors that don't take autoCommit param still default autoCommit to true.

        Maybe we could flip the autoCommit default to false, now, with the new ctors (the ones that take a MaxFieldLength). This may be better since you have to explicitly update your code, anyway, to switch to IndexWriter's new ctors and so if we call this change out in the javadocs, users are more likely to catch it. Whereas if we suddenly flip the default for autoCommit to false in 3.0, since there's no API signature change, users may not realize this had happened.

        OK I like that approach better. Are there any objections?

        But there are new constructors with this "IndexWriter.MaxFieldLength mfl" in it, that appear new in 2.4 but are deprecated

        I agree: we should not introduce new deprecated ctors. I'll eliminate these. This happened because there were two separate changes (addition of MaxFieldLength, and, deprecation of autoCommit).

        Show
        Michael McCandless added a comment - Achieving the migration from autoCommit=true to autoCommit=false is somewhat tricky. As things stand now, all ctors that don't take autoCommit param still default autoCommit to true. Maybe we could flip the autoCommit default to false, now, with the new ctors (the ones that take a MaxFieldLength). This may be better since you have to explicitly update your code, anyway, to switch to IndexWriter's new ctors and so if we call this change out in the javadocs, users are more likely to catch it. Whereas if we suddenly flip the default for autoCommit to false in 3.0, since there's no API signature change, users may not realize this had happened. OK I like that approach better. Are there any objections? But there are new constructors with this "IndexWriter.MaxFieldLength mfl" in it, that appear new in 2.4 but are deprecated I agree: we should not introduce new deprecated ctors. I'll eliminate these. This happened because there were two separate changes (addition of MaxFieldLength, and, deprecation of autoCommit).
        Michael McCandless made changes -
        Field Original Value New Value
        Assignee Michael McCandless [ mikemccand ]
        Michael McCandless made changes -
        Fix Version/s 2.9 [ 12312682 ]
        Fix Version/s 2.4 [ 12312681 ]
        Affects Version/s 2.9 [ 12312682 ]
        Hide
        Michael McCandless added a comment -

        Attached patch that removes the new deprecated ctors, and sets autoCommit=false for the new ctors (that take MaxFieldLength).

        The bulk of the patch is fixing all places where we were calling the new deprecated ctors.

        Show
        Michael McCandless added a comment - Attached patch that removes the new deprecated ctors, and sets autoCommit=false for the new ctors (that take MaxFieldLength). The bulk of the patch is fixing all places where we were calling the new deprecated ctors.
        Michael McCandless made changes -
        Attachment LUCENE-1401.patch [ 12390789 ]
        Hide
        Uwe Schindler added a comment -

        This patch seems to work, the IndexWriters created by the MaxFieldLength ctors are with autocommit=false, I have seen this, because the segment file does not change during indexing.

        There is on small thing (was also there before your patch):
        I use writer.setUseCompoundFile(true) to use compound files (which is also the default). It generates normally always only CFS files (on index creation, when optimizing,...). There is only one use case, when cfs and cfx files are generated:

        • Use IndexWriter with create=true
        • add documents to the index
        • optimize the index (without closing in between)

        After that the optimized index contains of one cfs and one cfx. During indexing (before optimization), I always see only cfs files for new segments (and for short times as usual the contents tfx,...).

        When optimizing the index after closing it or later after adding documents, i got only one cfs file.

        Two questions:

        • Is this a small bug, which would be not release critical - but it is strange?
        • How can I enable creation of doc store (cfx) and cfs always, I found nothing in the docs. In my opinion the separate cfs/cfx files are good for search performance (right?).
        Show
        Uwe Schindler added a comment - This patch seems to work, the IndexWriters created by the MaxFieldLength ctors are with autocommit=false, I have seen this, because the segment file does not change during indexing. There is on small thing (was also there before your patch): I use writer.setUseCompoundFile(true) to use compound files (which is also the default). It generates normally always only CFS files (on index creation, when optimizing,...). There is only one use case, when cfs and cfx files are generated: Use IndexWriter with create=true add documents to the index optimize the index (without closing in between) After that the optimized index contains of one cfs and one cfx. During indexing (before optimization), I always see only cfs files for new segments (and for short times as usual the contents tfx,...). When optimizing the index after closing it or later after adding documents, i got only one cfs file. Two questions: Is this a small bug, which would be not release critical - but it is strange? How can I enable creation of doc store (cfx) and cfs always, I found nothing in the docs. In my opinion the separate cfs/cfx files are good for search performance (right?).
        Hide
        Michael McCandless added a comment -

        That (cfx/cfs file creation) is actually "normal" behavior for
        Lucene.

        With autoCommit=false, in a single session of IndexWriter, Lucene
        will share the doc store files (stored fields, term vectors) across
        multiple segments. This saves alot of merge time because those files
        don't need to be merged if we are merging segments that all share the
        same doc store files. When building up a large index anew this saves
        alot of time.

        A cfx file is the compound-file format of the doc store files.

        However, when segments spanning multiple doc stores are merged, then
        the doc store files are in fact merged, and written privately for that
        one segment, and then folded into that segment's cfs file. When all
        such segments reference a given doc store segment are merged away,
        then that doc store segment is deleted.

        So it's currently only the "level 0" segments that may share a cfx
        file. As a future optimization we could consider extending Lucene's
        index format so that a single segment could reference multiple doc
        stores. This would require logic in FieldsReader and
        TermVectorsReader to do a binary search when locating which doc store
        segment holds a given document, but, would enable merging non level 0
        segments to skip having to merge the doc store. This is an invasive
        optimization.

        So you can't separately control when Lucene uses cfx file; it's the
        merge policy that indirectly controls this.

        Show
        Michael McCandless added a comment - That (cfx/cfs file creation) is actually "normal" behavior for Lucene. With autoCommit=false, in a single session of IndexWriter, Lucene will share the doc store files (stored fields, term vectors) across multiple segments. This saves alot of merge time because those files don't need to be merged if we are merging segments that all share the same doc store files. When building up a large index anew this saves alot of time. A cfx file is the compound-file format of the doc store files. However, when segments spanning multiple doc stores are merged, then the doc store files are in fact merged, and written privately for that one segment, and then folded into that segment's cfs file. When all such segments reference a given doc store segment are merged away, then that doc store segment is deleted. So it's currently only the "level 0" segments that may share a cfx file. As a future optimization we could consider extending Lucene's index format so that a single segment could reference multiple doc stores. This would require logic in FieldsReader and TermVectorsReader to do a binary search when locating which doc store segment holds a given document, but, would enable merging non level 0 segments to skip having to merge the doc store. This is an invasive optimization. So you can't separately control when Lucene uses cfx file; it's the merge policy that indirectly controls this.
        Hide
        Uwe Schindler added a comment -

        Thanks for the info, it did not know this!

        Show
        Uwe Schindler added a comment - Thanks for the info, it did not know this!
        Hide
        Michael McCandless added a comment -

        Committed revision 698932 on trunk (2.9) and 698933 on 2.4. Thanks Uwe!

        Show
        Michael McCandless added a comment - Committed revision 698932 on trunk (2.9) and 698933 on 2.4. Thanks Uwe!
        Michael McCandless made changes -
        Resolution Fixed [ 1 ]
        Status Open [ 1 ] Resolved [ 5 ]
        Michael McCandless made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Mark Thomas made changes -
        Workflow jira [ 12442645 ] Default workflow, editable Closed status [ 12562772 ]
        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12562772 ] jira [ 12583882 ]

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Uwe Schindler
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development