Solr
  1. Solr
  2. SOLR-2497

Move Solr to new NumericField stored field impl of LUCENE-3065

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.2, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      This implements the changes to NumericField (LUCENE-3065) in Solr. TrieField & Co would use NumericField for indexing and reading stored fields. To enable this some missing changes in Solr's internals (Field -> Fieldable) need to be done. Also some backwards compatible stored fields parsing is needed to read pre-3.2 indexes without reindexing (as the format changed a little bit and Document.getFieldable returns NumericField instances now).

      1. SOLR-2497.patch
        34 kB
        Uwe Schindler
      2. SOLR-2497.patch
        34 kB
        Uwe Schindler
      3. SOLR-2497.patch
        30 kB
        Uwe Schindler
      4. SOLR-2497.patch
        29 kB
        Uwe Schindler
      5. SOLR-2497-trunk.patch
        36 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Robert Muir added a comment -

          Bulk close for 3.2

          Show
          Robert Muir added a comment - Bulk close for 3.2
          Hide
          Michael McCandless added a comment -

          Nice work Uwe! This is a great example of the refactoring made possible by the merger.

          Show
          Michael McCandless added a comment - Nice work Uwe! This is a great example of the refactoring made possible by the merger.
          Hide
          Uwe Schindler added a comment -

          Committed trunk revision: 1100526

          Show
          Uwe Schindler added a comment - Committed trunk revision: 1100526
          Hide
          Uwe Schindler added a comment -

          This is the patch for trunk, including the hack for LUCENE-3080. Will commit soon!

          Show
          Uwe Schindler added a comment - This is the patch for trunk, including the hack for LUCENE-3080 . Will commit soon!
          Hide
          Uwe Schindler added a comment -

          Committed 3.x revision: 1100480

          Now forward-porting to trunk...

          Show
          Uwe Schindler added a comment - Committed 3.x revision: 1100480 Now forward-porting to trunk...
          Hide
          Uwe Schindler added a comment -

          I think this is now ready to commit.

          Show
          Uwe Schindler added a comment - I think this is now ready to commit.
          Hide
          Uwe Schindler added a comment -

          Huper duper patch that passes all tests!

          In general the design of TrieField and its subclasses and the delegation in TrieDateField are somehow crazy. We should maybe remove the superclass TrieField completely and copy all code inside the imense switch statements to the subclasses/TrieDateField. I think the whole thing was done initially because the original committer (before 1.4) did not want to add lots of extra classes but shortly before release Yonik changed that to the current state.

          There was also inconsistency between TrieDateField and TrieField (with type=DATE), thats now solved. My problem came from the refactoring I did to get rid of this. TrieField implemented toExternal() and indexedToReadable() incorrect, I only fixed toExternal() but missed indexedToReadable(); thanks @ Chris Male.

          Show
          Uwe Schindler added a comment - Huper duper patch that passes all tests! In general the design of TrieField and its subclasses and the delegation in TrieDateField are somehow crazy. We should maybe remove the superclass TrieField completely and copy all code inside the imense switch statements to the subclasses/TrieDateField. I think the whole thing was done initially because the original committer (before 1.4) did not want to add lots of extra classes but shortly before release Yonik changed that to the current state. There was also inconsistency between TrieDateField and TrieField (with type=DATE), thats now solved. My problem came from the refactoring I did to get rid of this. TrieField implemented toExternal() and indexedToReadable() incorrect, I only fixed toExternal() but missed indexedToReadable(); thanks @ Chris Male.
          Hide
          Chris Male added a comment -

          Ah I didn't read the rest of the issue. Fair enough. I'm glad its now fixed.

          Show
          Chris Male added a comment - Ah I didn't read the rest of the issue. Fair enough. I'm glad its now fixed.
          Hide
          Uwe Schindler added a comment -

          Hi Chris,
          you are the best! Only your soliution was not correct, the change must be made in TrieField (wrappedField). The problem here was a bug in the original code (as of Lucene 1.4 / 3.1) [the same I already mentioned above]: If you created in your Schema a TrieField with type attribute set to DATE, it behaved differently than if you used a TrieDateField (same issue had been fixed before in toExternal:

          TrieDateField code duplication was removed, all methods delegate to a wrapped TrieField. There was also an inconsistency between TrieField and TrieDateField's toExternal(). This was fixed to work correct (the date format was wrong, now it uses dateField.toExternal())

          The correct fix is to make TrieField.indexedToReadable call DateField's toExternal internally (like toExternal now does).

          Show
          Uwe Schindler added a comment - Hi Chris, you are the best! Only your soliution was not correct, the change must be made in TrieField (wrappedField). The problem here was a bug in the original code (as of Lucene 1.4 / 3.1) [the same I already mentioned above] : If you created in your Schema a TrieField with type attribute set to DATE, it behaved differently than if you used a TrieDateField (same issue had been fixed before in toExternal: TrieDateField code duplication was removed, all methods delegate to a wrapped TrieField. There was also an inconsistency between TrieField and TrieDateField's toExternal(). This was fixed to work correct (the date format was wrong, now it uses dateField.toExternal()) The correct fix is to make TrieField.indexedToReadable call DateField's toExternal internally (like toExternal now does).
          Hide
          Chris Male added a comment -

          Hey Uwe,

          I spent quite some time tracking this down. The problem is that the dates cannot be parsed because they are lacking the compulsory 'Z' on the end (its required by the date parser).

          You need to change TrieDateField#indexedToReadable to:

          return wrappedField.indexedToReadable(indexedForm) + Z;

          with that change, the test now passes for me.

          You can see in DateField#indexedToReadable it does the same thing.

          Show
          Chris Male added a comment - Hey Uwe, I spent quite some time tracking this down. The problem is that the dates cannot be parsed because they are lacking the compulsory 'Z' on the end (its required by the date parser). You need to change TrieDateField#indexedToReadable to: return wrappedField.indexedToReadable(indexedForm) + Z; with that change, the test now passes for me. You can see in DateField#indexedToReadable it does the same thing.
          Hide
          Uwe Schindler added a comment -

          Updated patch (some improvements in TrieField converter methods).

          Still distributed numeric facetting (TestDistributedSearch) fails for trie dates - i have no idea why!

          I need help!

          Show
          Uwe Schindler added a comment - Updated patch (some improvements in TrieField converter methods). Still distributed numeric facetting (TestDistributedSearch) fails for trie dates - i have no idea why! I need help!
          Hide
          Uwe Schindler added a comment -

          MoreLikeThis problem solved, it was as I said. The test included a TrieInt field into the "similarity fields", so it was used to calculate similarity. As with previous Solr the TrieField was invisible to MLT this had no effect.
          By the way: There is a commented out part with explicitely the MLT field, but I dont understand it. It seems that it was never understood/supported.
          Now, all numeric fields should work with MLT.

          Now only the TestDistributedSearch is still failing with a strange date failure. I'll dig.

          Show
          Uwe Schindler added a comment - MoreLikeThis problem solved, it was as I said. The test included a TrieInt field into the "similarity fields", so it was used to calculate similarity. As with previous Solr the TrieField was invisible to MLT this had no effect. By the way: There is a commented out part with explicitely the MLT field, but I dont understand it. It seems that it was never understood/supported. Now, all numeric fields should work with MLT. Now only the TestDistributedSearch is still failing with a strange date failure. I'll dig.
          Hide
          Uwe Schindler added a comment -

          Patch applies to 3.2 branch only and needs the patch from LUCENE-3065 applied before:

          Here a first step in cutover of Solr to NumericField. Most tests work, except:

          • TestDistributedSearch, fails with a strange date problem - I have no idea what goes wrong
          • TestMoreLikeThis: fails because the returned documents are different than expected. The reason for this is simple: As TrieField's underlying Lucene fields now are NumericField, stringValue() returns something (in contrast, solr's old fields returned null because they were binary). This confuses maybe MoreLikeThis (needs maybe fixed in Lucene, I havent looked into the code). Maybe we should simply exclude those fields or fix the test (I prefer latter one, because the numerics should also taken into account).

          The following changes had to be made:

          • Cut over all places in Solr where Field instead of abstract Fieldable is used to Fieldable. This affects some leftover parts in various components (calling Document.getField instead of Document.getFieldable), but mainly SchemaField/FieldType: createField() now returns Fieldable
          • TrieDateField code duplication was removed, all methods delegate to a wrapped TrieField. There was also an inconsitency between TrieField and TrieDateField's toExternal(). This was fixed to work correct (the date format was wrong, now it uses dateField.toExternal())

          If somebody could help with the rest of the solr stuff and maybe test test test! Yonik? Ryan? There may be some itches not covered by tests.

          Thanks for help from Solr specialists (I am definitely not one, I am more afraid of the code than I can help)!!!

          Show
          Uwe Schindler added a comment - Patch applies to 3.2 branch only and needs the patch from LUCENE-3065 applied before: Here a first step in cutover of Solr to NumericField. Most tests work, except: TestDistributedSearch, fails with a strange date problem - I have no idea what goes wrong TestMoreLikeThis: fails because the returned documents are different than expected. The reason for this is simple: As TrieField's underlying Lucene fields now are NumericField, stringValue() returns something (in contrast, solr's old fields returned null because they were binary). This confuses maybe MoreLikeThis (needs maybe fixed in Lucene, I havent looked into the code). Maybe we should simply exclude those fields or fix the test (I prefer latter one, because the numerics should also taken into account). The following changes had to be made: Cut over all places in Solr where Field instead of abstract Fieldable is used to Fieldable. This affects some leftover parts in various components (calling Document.getField instead of Document.getFieldable), but mainly SchemaField/FieldType: createField() now returns Fieldable TrieDateField code duplication was removed, all methods delegate to a wrapped TrieField. There was also an inconsitency between TrieField and TrieDateField's toExternal(). This was fixed to work correct (the date format was wrong, now it uses dateField.toExternal()) If somebody could help with the rest of the solr stuff and maybe test test test! Yonik? Ryan? There may be some itches not covered by tests. Thanks for help from Solr specialists (I am definitely not one, I am more afraid of the code than I can help)!!!

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development