|
Replacing the patch file (prev file was garbage - "svn stat" instead of "svn diff").
Few words on how this patch works:
Tests:
Remaining:
Seems like a good idea... given that norms are read once on-demand, I wouldn't expect anything search related to be slower with this. Opening a new reader should actually be slightly faster due to fewer files to open.
> - CFS now also maintains all norms in a single file.
Does this mean a separate file outside the final .cfs files? > Does this mean a separate file outside the final .cfs files?
Oh no - there's a single .nrm file in the .cfs file (instead of multiple .fN files in the .cfs file). Since we're adding a new file, shouldn't we give it a header, so that it's format can be revised? Something like:
new byte[] {'N','R','M',VERSION} as the first four bytes. We might someday decide to change the representation used, e.g., a different one-byte-float format, or permit higher resolution, or compression, or somesuch. Also, should we use a constant for ".nrm" extension, so that it's checked at compile-time? Thanks for the comments, Doug.
You're right of course, I will add both the header and the constant. (that would be either today or only in a week from now.) nrm.patch.2.txt:
Updated as Doug suggested:
And, fileFormat document is updated. Also, I checked again that the seeks for the various field norms are lazy - performed only when bytes are actually read with refill(). I am updating the patch (nrm.patch.3.txt):
All tests pass. Thanks for commiting this Yonik!
Seems the added test TestNorms was not commited..? Hmmm, I actually did an "svn status" to see if there was anything to add too.
Problem is, my current tree is too messy and I missed it. Thanks for the double-check. I would like to propose some small improvements to this nice feature.
I've worked out a patch (will attach shortly). Doron if you agree / Proposed changes:
I think in general we should favor storing things like this I created a new FORMAT_MERGED_NORMS in SegmentInfos for this. The This then has the nice side effect of not having to create the
I agree that reducing the IO operations on an index open is a good thing.
> For indices written to before this gets No hard rule on this, but IMO that may be a small enough window that compatibility is not needed. > No hard rule on this, but IMO that may be a small enough window that compatibility is not needed.
This is a good question. I had flip/flop'd on it. It would be nice Any objections to this? If not I will re-work the patch (it makes things a fair bit cleaner). As an aside, I think we need to start making more frequent releases... then "trunk" could be designated as a work-in-progress and unstable, and hence compatibility concerns could be limited to those releases.
OK, take two! I attached
I removed backwards compatibility for the past 10 days of Lucene I may have the only app that will be broken by the 10-day backwards incompatibility, but the change seems worth it. I need to create some large indexes to take on the road for demos. Is the index format in the latest patch final?
Actually, if you apply my first change above, regen your index, then the format will be readable to the 2nd patch.
Chuck, I think this latest patch would likely be the "final" index file format for this issue, pending any more feedback on it though! Michael, I like this improvement!
(At first I considered adding such FORMAT level but decided that it is not worth it, - aiming backwards compatibility with pre-lockless indexes. Then I had to add that file check - wrong trade-off indeed.) Two minor comments:
Thanks for improving this, > the term "merged" (in hasMergedNorms) is a little overloaded with other semantics (in Lucene)
Unified? Single? Just to let you know - I checked this with recent patch for Lucene-741 (Field norm modifier) --> working as is with this improvement.
OK thanks Doron. I will make the fixes you suggested!
I like "single" – I will redo the "non backwards compatible for past 10 days" patch with these fixes! OK I committed the fix (changed the name to "singleNormFile"). Thanks
everyone! Closing all issues that were resolved for 2.1.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Modification is backwards compatible - existing indexes with norms in a file per norm are read. - the first merge would create a single .nrm file.
All tests pass.
No performance degtadations were observed as result of this change, but my tests so far were not very extensive.