Lucene - Core
  1. Lucene - Core
  2. LUCENE-5131

CheckIndex is confusing for docvalues fields

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.5, 5.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      it prints things like:

      test: docvalues.......OK [0 total doc count; 18 docvalues fields]
      
      1. LUCENE-5131.patch
        0.9 kB
        Robert Muir
      2. LUCENE-5131.patch
        4 kB
        Robert Muir

        Activity

        Hide
        Adrien Grand added a comment -

        4.5 release -> bulk close

        Show
        Adrien Grand added a comment - 4.5 release -> bulk close
        Hide
        ASF subversion and git services added a comment -

        Commit 1506968 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1506968 ]

        LUCENE-5131: CheckIndex is confusing for docvalues fields

        Show
        ASF subversion and git services added a comment - Commit 1506968 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1506968 ] LUCENE-5131 : CheckIndex is confusing for docvalues fields
        Hide
        ASF subversion and git services added a comment -

        Commit 1506964 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1506964 ]

        LUCENE-5131: CheckIndex is confusing for docvalues fields

        Show
        ASF subversion and git services added a comment - Commit 1506964 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1506964 ] LUCENE-5131 : CheckIndex is confusing for docvalues fields
        Hide
        Adrien Grand added a comment -

        Definitely +1 for this patch and printing statistics about unique value counts for SORTED and SORTED_SET.

        Show
        Adrien Grand added a comment - Definitely +1 for this patch and printing statistics about unique value counts for SORTED and SORTED_SET.
        Hide
        Robert Muir added a comment -

        OK here is a little more details. I tried to keep it not very verbose as a start.

           [junit4]   1>   3 of 5: name=_2 docCount=10
           [junit4]   1>     codec=Lucene42
           [junit4]   1>     compound=false
           [junit4]   1>     numFiles=15
           [junit4]   1>     size (MB)=0.008
           [junit4]   1>     diagnostics = {timestamp=1362970606621, os=Linux, os.version=3.5.0-23-generic, source=flush, lucene.version=4.2-SNAPSHOT, os.arch=amd64, java.version=1.7.0_09, java.vendor=Oracle Corporation}
           [junit4]   1>     no deletions
           [junit4]   1>     test: open reader.........OK
           [junit4]   1>     test: fields..............OK [24 fields]
           [junit4]   1>     test: field norms.........OK [7 fields]
           [junit4]   1>     test: terms, freq, prox...OK [83 terms; 560 terms/docs pairs; 370 tokens]
           [junit4]   1>     test: stored fields.......OK [70 total field count; avg 7 fields per doc]
           [junit4]   1>     test: term vectors........OK [60 total vector count; avg 6 term/freq vector fields per doc]
           [junit4]   1>     test: docvalues...........OK [14 docvalues fields; 4 BINARY; 7 NUMERIC; 2 SORTED; 1 SORTED_SET]
        

        Ill see if i can take a stab at per-field stuff to print with -verbose.

        Show
        Robert Muir added a comment - OK here is a little more details. I tried to keep it not very verbose as a start. [junit4] 1> 3 of 5: name=_2 docCount=10 [junit4] 1> codec=Lucene42 [junit4] 1> compound=false [junit4] 1> numFiles=15 [junit4] 1> size (MB)=0.008 [junit4] 1> diagnostics = {timestamp=1362970606621, os=Linux, os.version=3.5.0-23-generic, source=flush, lucene.version=4.2-SNAPSHOT, os.arch=amd64, java.version=1.7.0_09, java.vendor=Oracle Corporation} [junit4] 1> no deletions [junit4] 1> test: open reader.........OK [junit4] 1> test: fields..............OK [24 fields] [junit4] 1> test: field norms.........OK [7 fields] [junit4] 1> test: terms, freq, prox...OK [83 terms; 560 terms/docs pairs; 370 tokens] [junit4] 1> test: stored fields.......OK [70 total field count; avg 7 fields per doc] [junit4] 1> test: term vectors........OK [60 total vector count; avg 6 term/freq vector fields per doc] [junit4] 1> test: docvalues...........OK [14 docvalues fields; 4 BINARY; 7 NUMERIC; 2 SORTED; 1 SORTED_SET] Ill see if i can take a stab at per-field stuff to print with -verbose.
        Hide
        Michael McCandless added a comment -

        +1 to the patch and +1 to print more details!

        Show
        Michael McCandless added a comment - +1 to the patch and +1 to print more details!
        Hide
        Robert Muir added a comment -

        Here's a patch to remove the doc count.

        I think actually we should make this better, and summarize info by dv-type?

        We could also (maybe only with -verbose ?) print things like valueCount() for the Sorted/SortedSet fields so you know how many unique values there are (sorta like printing how many terms are in terms dict)

        Show
        Robert Muir added a comment - Here's a patch to remove the doc count. I think actually we should make this better, and summarize info by dv-type? We could also (maybe only with -verbose ?) print things like valueCount() for the Sorted/SortedSet fields so you know how many unique values there are (sorta like printing how many terms are in terms dict)

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development