Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      CheckIndex is great for testing and when tracking down lucene bugs.

      But in cases where users just want to verify their index files are OK, it is very slow and expensive.

      I think we should add a -fast option, that only opens the reader and calls checkIntegrity(). This means all files are the correct files (identifiers match) and have the correct CRC32 checksums.

      For our 10M doc wikipedia index, this is the difference between a 2 second check and a 2 minute check.

        Activity

        Hide
        Robert Muir added a comment -

        Here is an initial patch.

        Show
        Robert Muir added a comment - Here is an initial patch.
        Hide
        Robert Muir added a comment -

        For trunk, i think something like this patch is ok? The user gets a pretty solid check with the option, and its fast.

        On 5.x i have concerns. For lucene 4.0-4.7 segments, which are supported there, the option is kind of trappy, maybe just a warning beside the option's usage() in 5.x branch is ok since its not set by default anyway?

        Show
        Robert Muir added a comment - For trunk, i think something like this patch is ok? The user gets a pretty solid check with the option, and its fast. On 5.x i have concerns. For lucene 4.0-4.7 segments, which are supported there, the option is kind of trappy, maybe just a warning beside the option's usage() in 5.x branch is ok since its not set by default anyway?
        Hide
        Michael McCandless added a comment -

        This is awesome.

        +1 for patch and 5.x.

        Does -verbose work with -fast? I think it should (we seem to do null checks for all the terms dict stats), maybe add a test? It's a nice (fast0 way to see RAM usage for the index...

        Does -exorcise and -fast work?

        In the usage can you also confess that the identifiers are also "cross-checked"?

        Show
        Michael McCandless added a comment - This is awesome. +1 for patch and 5.x. Does -verbose work with -fast? I think it should (we seem to do null checks for all the terms dict stats), maybe add a test? It's a nice (fast0 way to see RAM usage for the index... Does -exorcise and -fast work? In the usage can you also confess that the identifiers are also "cross-checked"?
        Hide
        Robert Muir added a comment -

        Does -verbose work with -fast?

        yes

        Does -exorcise and -fast work?

        yes

        In the usage can you also confess that the identifiers are also "cross-checked"?

        ???? Reader open is doing this check.

        Show
        Robert Muir added a comment - Does -verbose work with -fast? yes Does -exorcise and -fast work? yes In the usage can you also confess that the identifiers are also "cross-checked"? ???? Reader open is doing this check.
        Hide
        Michael McCandless added a comment -

        ???? Reader open is doing this check.

        Right, I just mean the usage output ("-fast: just verify file checksums, omitting logical integrity checks") is selling this option short because we do more than just verify file checksums. Maybe it could say something like "only perform fast verification such as file checksums, segment ids are consistent, etc."?

        Show
        Michael McCandless added a comment - ???? Reader open is doing this check. Right, I just mean the usage output ("-fast: just verify file checksums, omitting logical integrity checks") is selling this option short because we do more than just verify file checksums. Maybe it could say something like "only perform fast verification such as file checksums, segment ids are consistent, etc."?
        Hide
        Robert Muir added a comment -

        My concerns are that it oversells it

        The checks it does are all codec specific.

        Show
        Robert Muir added a comment - My concerns are that it oversells it The checks it does are all codec specific.
        Hide
        Michael McCandless added a comment -

        OK I'm fine w/ leaving it as is.

        Show
        Michael McCandless added a comment - OK I'm fine w/ leaving it as is.
        Hide
        Robert Muir added a comment -

        This is related to my concerns about the option on 4.x segments where it does less or nothing at all...

        Show
        Robert Muir added a comment - This is related to my concerns about the option on 4.x segments where it does less or nothing at all...
        Hide
        Robert Muir added a comment -

        I am so frustrated with the back compat, combined with -exorcise option, which traps us.

        I'll make the change for trunk only.

        5.x can have a fast checkindex when something gives.

        Show
        Robert Muir added a comment - I am so frustrated with the back compat, combined with -exorcise option, which traps us. I'll make the change for trunk only. 5.x can have a fast checkindex when something gives.
        Hide
        Robert Muir added a comment -

        I added a second test that turns on verbose, too

        Show
        Robert Muir added a comment - I added a second test that turns on verbose, too
        Hide
        ASF subversion and git services added a comment -

        Commit 1664633 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1664633 ]

        LUCENE-6341: Add a -fast option to CheckIndex

        Show
        ASF subversion and git services added a comment - Commit 1664633 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1664633 ] LUCENE-6341 : Add a -fast option to CheckIndex
        Hide
        Robert Muir added a comment -

        I think I know of a good solution for backporting.

        My problem is here at the end:

            if (result.clean) {
              msg(infoStream, "No problems were detected with this index.\n");
            }
        

        I think if the -fast option is used, and we find segments older than 5.0 we can emit a warning right here. If we find any older than 4.8 it could be a much stronger warning. In either case it just recommends to do a full checkindex to be sure.

        Show
        Robert Muir added a comment - I think I know of a good solution for backporting. My problem is here at the end: if (result.clean) { msg(infoStream, "No problems were detected with this index.\n" ); } I think if the -fast option is used, and we find segments older than 5.0 we can emit a warning right here. If we find any older than 4.8 it could be a much stronger warning. In either case it just recommends to do a full checkindex to be sure.
        Hide
        Michael McCandless added a comment -

        I think if the -fast option is used, and we find segments older than 5.0 we can emit a warning right here. If we find any older than 4.8 it could be a much stronger warning. In either case it just recommends to do a full checkindex to be sure.

        +1

        Show
        Michael McCandless added a comment - I think if the -fast option is used, and we find segments older than 5.0 we can emit a warning right here. If we find any older than 4.8 it could be a much stronger warning. In either case it just recommends to do a full checkindex to be sure. +1
        Hide
        ASF subversion and git services added a comment -

        Commit 1665064 from Robert Muir in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1665064 ]

        LUCENE-6341: Add a -fast option to CheckIndex

        Show
        ASF subversion and git services added a comment - Commit 1665064 from Robert Muir in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1665064 ] LUCENE-6341 : Add a -fast option to CheckIndex
        Hide
        ASF subversion and git services added a comment -

        Commit 1665065 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1665065 ]

        LUCENE-6341: move CHANGES entry in trunk

        Show
        ASF subversion and git services added a comment - Commit 1665065 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1665065 ] LUCENE-6341 : move CHANGES entry in trunk
        Hide
        Timothy Potter added a comment -

        Bulk close after 5.1 release

        Show
        Timothy Potter added a comment - Bulk close after 5.1 release

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development