Lucene - Core
  1. Lucene - Core
  2. LUCENE-2883

Consolidate Solr & Lucene FunctionQuery into modules

    Details

    • Lucene Fields:
      New

      Description

      Spin-off from the dev list

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Hide
          Michael McCandless added a comment -

          +1, this is great Chris!

          Show
          Michael McCandless added a comment - +1, this is great Chris!
          Hide
          Robert Muir added a comment -

          Thanks for all your hard refactoring work here Chris!

          Show
          Robert Muir added a comment - Thanks for all your hard refactoring work here Chris!
          Hide
          Chris Male added a comment -

          Committed revision 1141749.

          Its done. Finally.

          Show
          Chris Male added a comment - Committed revision 1141749. Its done. Finally.
          Hide
          Chris Male added a comment -

          Rather than doing all the work in this issue, I'm going to spin off a few subtasks and resolve this one by one.

          Show
          Chris Male added a comment - Rather than doing all the work in this issue, I'm going to spin off a few subtasks and resolve this one by one.
          Show
          Michael McCandless added a comment - OK I created branch https://svn.apache.org/repos/asf/lucene/dev/branches/fqmodule_2883
          Hide
          Yonik Seeley added a comment -

          > Sort instances are like Query instances, and for many reasons should not be bound to any particular searcher.

          Yeah that is true. But ValueSource#getSort actually returns a SortField. Does the same apply to SortField instances?

          Yeah, that's part of a Sort.

          Also, ValueSource#weight(IndexSearcher) returns a new SortField as well, with a context containing the IndexSearcher. Consequently the new SortField is bound to that particular searcher.

          ValueSourceSortField.weight(IndexSearcher) - right. That's sort of how queries can be re-written too... and the returned temporary SortField is never retained. I guess I should have said that it should always be possible for Query and Sort instances to be independent of searchers and readers... but temporary ones can be created for the purposes of actually executing the query.

          Note that Solr's query cache uses Sort as part of the cache key too.

          Show
          Yonik Seeley added a comment - > Sort instances are like Query instances, and for many reasons should not be bound to any particular searcher. Yeah that is true. But ValueSource#getSort actually returns a SortField. Does the same apply to SortField instances? Yeah, that's part of a Sort. Also, ValueSource#weight(IndexSearcher) returns a new SortField as well, with a context containing the IndexSearcher. Consequently the new SortField is bound to that particular searcher. ValueSourceSortField.weight(IndexSearcher) - right. That's sort of how queries can be re-written too... and the returned temporary SortField is never retained. I guess I should have said that it should always be possible for Query and Sort instances to be independent of searchers and readers... but temporary ones can be created for the purposes of actually executing the query. Note that Solr's query cache uses Sort as part of the cache key too.
          Hide
          Chris Male added a comment -

          Is this desirable? IndexSearcher is pretty thin I know but is it fast enough to create that it has a nominal effect? If its faster than passing down the IndexSearcher then maybe its a good idea for anybody wanting an IndexSearcher to do this.

          Actually one flaw in this is that SolrIndexSearcher has overridden some functionality in IndexSearcher, such as SimilarityProvider, which is used NormValueSource.

          Show
          Chris Male added a comment - Is this desirable? IndexSearcher is pretty thin I know but is it fast enough to create that it has a nominal effect? If its faster than passing down the IndexSearcher then maybe its a good idea for anybody wanting an IndexSearcher to do this. Actually one flaw in this is that SolrIndexSearcher has overridden some functionality in IndexSearcher, such as SimilarityProvider, which is used NormValueSource.
          Hide
          Chris Male added a comment -

          I've opened SOLR-2533 to look at ways to standardise the ValueSource sort weighting API.

          Show
          Chris Male added a comment - I've opened SOLR-2533 to look at ways to standardise the ValueSource sort weighting API.
          Hide
          Chris Male added a comment -

          Hey Yonik,

          Its super to hear from you on this, it'll be a real help.

          Regarding weighting - function queries can contain normal queries, so anywhere a function query is used, it must be weighted first.

          Yup I've come to understand that. So the challenge is how to do this when a FunctionQuery is used to sort and not Query? Okay. I'm going to open an issue to see if we can address this better, maybe by extending SortField or something.

          Sort instances are like Query instances, and for many reasons should not be bound to any particular searcher.

          Yeah that is true. But ValueSource#getSort actually returns a SortField. Does the same apply to SortField instances? Also, ValueSource#weight(IndexSearcher) returns a new SortField as well, with a context containing the IndexSearcher. Consequently the new SortField is bound to that particular searcher.

          Show
          Chris Male added a comment - Hey Yonik, Its super to hear from you on this, it'll be a real help. Regarding weighting - function queries can contain normal queries, so anywhere a function query is used, it must be weighted first. Yup I've come to understand that. So the challenge is how to do this when a FunctionQuery is used to sort and not Query? Okay. I'm going to open an issue to see if we can address this better, maybe by extending SortField or something. Sort instances are like Query instances, and for many reasons should not be bound to any particular searcher. Yeah that is true. But ValueSource#getSort actually returns a SortField. Does the same apply to SortField instances? Also, ValueSource#weight(IndexSearcher) returns a new SortField as well, with a context containing the IndexSearcher. Consequently the new SortField is bound to that particular searcher.
          Hide
          Chris Male added a comment -

          So the goal here is to make the top-level searcher (IR) visible to the
          FQ's getValues? I think this pre-dated the cutover to
          AtomicReaderContext, which now provides the top reader? Maybe this
          isn't needed anymore...?

          Its very intentional (as Yonik has pointed out). It allows any Querys in the ValueSources to be weighted.

          Though QueryValueSource needs a searcher (but, seems to make one, from
          the top reader, if it wasn't provided one).

          Is this desirable? IndexSearcher is pretty thin I know but is it fast enough to create that it has a nominal effect? If its faster than passing down the IndexSearcher then maybe its a good idea for anybody wanting an IndexSearcher to do this.

          Good question... maybe we can do this on a branch?

          Absolutely, can you create one?

          Show
          Chris Male added a comment - So the goal here is to make the top-level searcher (IR) visible to the FQ's getValues? I think this pre-dated the cutover to AtomicReaderContext, which now provides the top reader? Maybe this isn't needed anymore...? Its very intentional (as Yonik has pointed out). It allows any Querys in the ValueSources to be weighted. Though QueryValueSource needs a searcher (but, seems to make one, from the top reader, if it wasn't provided one). Is this desirable? IndexSearcher is pretty thin I know but is it fast enough to create that it has a nominal effect? If its faster than passing down the IndexSearcher then maybe its a good idea for anybody wanting an IndexSearcher to do this. Good question... maybe we can do this on a branch? Absolutely, can you create one?
          Hide
          Yonik Seeley added a comment -

          Regarding weighting - function queries can contain normal queries, so anywhere a function query is used, it must be weighted first.

          When ValueSource#getSort is called (which is only in 1 place really), we can pass in the IndexSearcher, meaning the SortField can be 'weighted' then and there.

          Sort instances are like Query instances, and for many reasons should not be bound to any particular searcher.

          Show
          Yonik Seeley added a comment - Regarding weighting - function queries can contain normal queries, so anywhere a function query is used, it must be weighted first. When ValueSource#getSort is called (which is only in 1 place really), we can pass in the IndexSearcher, meaning the SortField can be 'weighted' then and there. Sort instances are like Query instances, and for many reasons should not be bound to any particular searcher.
          Hide
          Michael McCandless added a comment -

          Hmm good question. This looks to be related to sorting by FQ (SOLR-1297) because some FQs need to be weighted. Not sure what to do here yet... which FQs in particular require this?

          Both all of them and not many of them (complicated). The sorting of FQ
          functionality is necessary for all FQs in Solr since the user can sort
          by any FQ. However the extension made by the SolrSortField is the ability
          to create a 'weighted' SortField by passing in a IndexSearcher and having
          it stored in a Map. The Map is then made available to any DocValues when
          they create their values.

          So the goal here is to make the top-level searcher (IR) visible to the
          FQ's getValues? I think this pre-dated the cutover to
          AtomicReaderContext, which now provides the top reader? Maybe this
          isn't needed anymore...?

          This is when the 'not many' comes into effect. Only a few DocValues implementations
          use the contents of the Map. DocFreqValueSource for example uses the IndexSearcher
          in the Map. But I suppose there could be many user implementations that do.

          EG DocFreqValueSource could pull docFreq from the top reader?

          Though QueryValueSource needs a searcher (but, seems to make one, from
          the top reader, if it wasn't provided one).

          SolrSortField is currently used in SolrIndexSearcher to 'weight' the Sorts. I wonder
          whether we can simplify this? When ValueSource#getSort is called (which is only in 1
          place really), we can pass in the IndexSearcher, meaning the SortField can be 'weighted'
          then and there.

          Since SolrSortField is only used in this 1 place currently, we can then revisit dropping it?

          That seems good too?

          Do you think its worth opening an issue to address this first?

          Yes can you do that, and mark this issue as depending in it?

          I think apply 90/10 rule here? Start with the easy-to-move queries? We don't need initial go to be perfect... progress not perfection.

          Could we sort the initial commit out and then I can move them over in batches?
          Already have a 108k patch, I'd say moving what we can will push it towards 300k

          Good question... maybe we can do this on a branch?

          Do you have a sense of whether Solr's FQs are a superset of Lucene's? Ie, is there anything Lucene's FQs can do that Solr's can't?

          Solr FQs are hugely more advanced than the ValueSourceQuery based stuff in Lucene.
          Its not a full 1 to 1 change since the APIs are slightly different, but I'd say
          that we'd want users to use the FQ line of classes. I cant see anything in Lucene's
          VSQs that you couldn't do using FQs.

          OK then once we finally have the "superset" moved into the module we
          should remove Lucene's (deprecate on 3.x).

          Show
          Michael McCandless added a comment - Hmm good question. This looks to be related to sorting by FQ ( SOLR-1297 ) because some FQs need to be weighted. Not sure what to do here yet... which FQs in particular require this? Both all of them and not many of them (complicated). The sorting of FQ functionality is necessary for all FQs in Solr since the user can sort by any FQ. However the extension made by the SolrSortField is the ability to create a 'weighted' SortField by passing in a IndexSearcher and having it stored in a Map. The Map is then made available to any DocValues when they create their values. So the goal here is to make the top-level searcher (IR) visible to the FQ's getValues? I think this pre-dated the cutover to AtomicReaderContext, which now provides the top reader? Maybe this isn't needed anymore...? This is when the 'not many' comes into effect. Only a few DocValues implementations use the contents of the Map. DocFreqValueSource for example uses the IndexSearcher in the Map. But I suppose there could be many user implementations that do. EG DocFreqValueSource could pull docFreq from the top reader? Though QueryValueSource needs a searcher (but, seems to make one, from the top reader, if it wasn't provided one). SolrSortField is currently used in SolrIndexSearcher to 'weight' the Sorts. I wonder whether we can simplify this? When ValueSource#getSort is called (which is only in 1 place really), we can pass in the IndexSearcher, meaning the SortField can be 'weighted' then and there. Since SolrSortField is only used in this 1 place currently, we can then revisit dropping it? That seems good too? Do you think its worth opening an issue to address this first? Yes can you do that, and mark this issue as depending in it? I think apply 90/10 rule here? Start with the easy-to-move queries? We don't need initial go to be perfect... progress not perfection. Could we sort the initial commit out and then I can move them over in batches? Already have a 108k patch, I'd say moving what we can will push it towards 300k Good question... maybe we can do this on a branch? Do you have a sense of whether Solr's FQs are a superset of Lucene's? Ie, is there anything Lucene's FQs can do that Solr's can't? Solr FQs are hugely more advanced than the ValueSourceQuery based stuff in Lucene. Its not a full 1 to 1 change since the APIs are slightly different, but I'd say that we'd want users to use the FQ line of classes. I cant see anything in Lucene's VSQs that you couldn't do using FQs. OK then once we finally have the "superset" moved into the module we should remove Lucene's (deprecate on 3.x).
          Hide
          Chris Male added a comment - - edited

          I think we should move Mutable* over? Grouping module will need all of these, I think? (Ie if we want to allow users to group by arbitrary typed field).

          Yup okay.

          Hmm good question. This looks to be related to sorting by FQ (SOLR-1297) because some FQs need to be weighted. Not sure what to do here yet... which FQs in particular require this?

          Both all of them and not many of them (complicated). The sorting of FQ
          functionality is necessary for all FQs in Solr since the user can sort
          by any FQ. However the extension made by the SolrSortField is the ability
          to create a 'weighted' SortField by passing in a IndexSearcher and having
          it stored in a Map. The Map is then made available to any DocValues when
          they create their values.

          This is when the 'not many' comes into effect. Only a few DocValues implementations
          use the contents of the Map. DocFreqValueSource for example uses the IndexSearcher
          in the Map. But I suppose there could be many user implementations that do.

          SolrSortField is currently used in SolrIndexSearcher to 'weight' the Sorts. I wonder
          whether we can simplify this? When ValueSource#getSort is called (which is only in 1
          place really), we can pass in the IndexSearcher, meaning the SortField can be 'weighted'
          then and there.

          Since SolrSortField is only used in this 1 place currently, we can then revisit dropping it?

          Do you think its worth opening an issue to address this first?

          I think apply 90/10 rule here? Start with the easy-to-move queries? We don't need initial go to be perfect... progress not perfection.

          Could we sort the initial commit out and then I can move them over in batches?
          Already have a 108k patch, I'd say moving what we can will push it towards 300k

          Do you have a sense of whether Solr's FQs are a superset of Lucene's? Ie, is there anything Lucene's FQs can do that Solr's can't?

          Solr FQs are hugely more advanced than the ValueSourceQuery based stuff in Lucene.
          Its not a full 1 to 1 change since the APIs are slightly different, but I'd say
          that we'd want users to use the FQ line of classes. I cant see anything in Lucene's
          VSQs that you couldn't do using FQs.

          Show
          Chris Male added a comment - - edited I think we should move Mutable* over? Grouping module will need all of these, I think? (Ie if we want to allow users to group by arbitrary typed field). Yup okay. Hmm good question. This looks to be related to sorting by FQ ( SOLR-1297 ) because some FQs need to be weighted. Not sure what to do here yet... which FQs in particular require this? Both all of them and not many of them (complicated). The sorting of FQ functionality is necessary for all FQs in Solr since the user can sort by any FQ. However the extension made by the SolrSortField is the ability to create a 'weighted' SortField by passing in a IndexSearcher and having it stored in a Map. The Map is then made available to any DocValues when they create their values. This is when the 'not many' comes into effect. Only a few DocValues implementations use the contents of the Map. DocFreqValueSource for example uses the IndexSearcher in the Map. But I suppose there could be many user implementations that do. SolrSortField is currently used in SolrIndexSearcher to 'weight' the Sorts. I wonder whether we can simplify this? When ValueSource#getSort is called (which is only in 1 place really), we can pass in the IndexSearcher, meaning the SortField can be 'weighted' then and there. Since SolrSortField is only used in this 1 place currently, we can then revisit dropping it? Do you think its worth opening an issue to address this first? I think apply 90/10 rule here? Start with the easy-to-move queries? We don't need initial go to be perfect... progress not perfection. Could we sort the initial commit out and then I can move them over in batches? Already have a 108k patch, I'd say moving what we can will push it towards 300k Do you have a sense of whether Solr's FQs are a superset of Lucene's? Ie, is there anything Lucene's FQs can do that Solr's can't? Solr FQs are hugely more advanced than the ValueSourceQuery based stuff in Lucene. Its not a full 1 to 1 change since the APIs are slightly different, but I'd say that we'd want users to use the FQ line of classes. I cant see anything in Lucene's VSQs that you couldn't do using FQs.
          Hide
          Michael McCandless added a comment -

          Thanks Chris! The patch applies cleanly for me (after running the svn
          commands) and everything compiles.

          I think the patch is a great start, ie, we will need the low level
          infra used by FQs in the module.

          MutableValue & MutableFloatValue are used in the FunctionQuery code so I've pulled them into the module too. Should all the other Mutable*Value classes come too? Should they go into some other module?

          I think we should move Mutable* over? Grouping module will need all
          of these, I think? (Ie if we want to allow users to group by
          arbitrary typed field).

          What to return in ValueSource#getSortField which currently returns a SortField which implements SolrSortField. This is currently commented out so we can determine what best to do. Having this commented out breaks the Solr tests.

          Hmm good question. This looks to be related to sorting by FQ
          (SOLR-1297) because some FQs need to be weighted. Not sure what to do
          here yet... which FQs in particular require this?

          Many of the ValueSources and DocValues in Solr could be moved to the module, but not all of them. Some have dependencies on Solr dependencies / Solr core code.

          I think apply 90/10 rule here? Start with the easy-to-move queries?
          We don't need initial go to be perfect... progress not perfection.

          Lucene core's FunctionQuery stuff needs to be removed.

          Do you have a sense of whether Solr's FQs are a superset of Lucene's?
          Ie, is there anything Lucene's FQs can do that Solr's can't?

          Probably, as a separate issue, we should also move contrib/queries ->
          modules/queries. And I think the cool nested queries (LUCENE-2454)
          would also go into this module...

          Show
          Michael McCandless added a comment - Thanks Chris! The patch applies cleanly for me (after running the svn commands) and everything compiles. I think the patch is a great start, ie, we will need the low level infra used by FQs in the module. MutableValue & MutableFloatValue are used in the FunctionQuery code so I've pulled them into the module too. Should all the other Mutable*Value classes come too? Should they go into some other module? I think we should move Mutable* over? Grouping module will need all of these, I think? (Ie if we want to allow users to group by arbitrary typed field). What to return in ValueSource#getSortField which currently returns a SortField which implements SolrSortField. This is currently commented out so we can determine what best to do. Having this commented out breaks the Solr tests. Hmm good question. This looks to be related to sorting by FQ ( SOLR-1297 ) because some FQs need to be weighted. Not sure what to do here yet... which FQs in particular require this? Many of the ValueSources and DocValues in Solr could be moved to the module, but not all of them. Some have dependencies on Solr dependencies / Solr core code. I think apply 90/10 rule here? Start with the easy-to-move queries? We don't need initial go to be perfect... progress not perfection. Lucene core's FunctionQuery stuff needs to be removed. Do you have a sense of whether Solr's FQs are a superset of Lucene's? Ie, is there anything Lucene's FQs can do that Solr's can't? Probably, as a separate issue, we should also move contrib/queries -> modules/queries. And I think the cool nested queries ( LUCENE-2454 ) would also go into this module...
          Hide
          Chris Male added a comment -

          Script needed to run to use patch:

          svn mkdir --parents modules/queries/src/java/org/apache/lucene/queries/function
          svn move solr/src/java/org/apache/solr/search/function/FunctionQuery.java modules/queries/src/java/org/apache/lucene/queries/function/FunctionQuery.java
          svn move solr/src/java/org/apache/solr/search/function/ValueSource.java modules/queries/src/java/org/apache/lucene/queries/function/ValueSource.java
          svn move solr/src/java/org/apache/solr/search/function/DocValues.java modules/queries/src/java/org/apache/lucene/queries/function/DocValues.java
          svn move solr/src/java/org/apache/solr/search/MutableValue.java modules/queries/src/java/org/apache/lucene/queries/function/MutableValue.java
          svn move solr/src/java/org/apache/solr/search/MutableValueFloat.java modules/queries/src/java/org/apache/lucene/queries/function/MutableValueFloat.java
          
          Show
          Chris Male added a comment - Script needed to run to use patch: svn mkdir --parents modules/queries/src/java/org/apache/lucene/queries/function svn move solr/src/java/org/apache/solr/search/function/FunctionQuery.java modules/queries/src/java/org/apache/lucene/queries/function/FunctionQuery.java svn move solr/src/java/org/apache/solr/search/function/ValueSource.java modules/queries/src/java/org/apache/lucene/queries/function/ValueSource.java svn move solr/src/java/org/apache/solr/search/function/DocValues.java modules/queries/src/java/org/apache/lucene/queries/function/DocValues.java svn move solr/src/java/org/apache/solr/search/MutableValue.java modules/queries/src/java/org/apache/lucene/queries/function/MutableValue.java svn move solr/src/java/org/apache/solr/search/MutableValueFloat.java modules/queries/src/java/org/apache/lucene/queries/function/MutableValueFloat.java
          Hide
          Chris Male added a comment -

          Patch that factors out the core FunctionQuery stuff into a queries module. Theres alot of issues here but it does compile.

          The following issues need to be addressed:

          • MutableValue & MutableFloatValue are used in the FunctionQuery code so I've pulled them into the module too. Should all the other Mutable*Value classes come too? Should they go into some other module?
          • What to return in ValueSource#getSortField which currently returns a SortField which implements SolrSortField. This is currently commented out so we can determine what best to do. Having this commented out breaks the Solr tests.
          • Many of the ValueSources and DocValues in Solr could be moved to the module, but not all of them. Some have dependencies on Solr dependencies / Solr core code.
          • Module isn't full integrated into the build.xmls and dev-tools.
          • Lucene core's FunctionQuery stuff needs to be removed.

          I'll add a script that needs to be run before adding this patch shortly.

          Show
          Chris Male added a comment - Patch that factors out the core FunctionQuery stuff into a queries module. Theres alot of issues here but it does compile. The following issues need to be addressed: MutableValue & MutableFloatValue are used in the FunctionQuery code so I've pulled them into the module too. Should all the other Mutable*Value classes come too? Should they go into some other module? What to return in ValueSource#getSortField which currently returns a SortField which implements SolrSortField. This is currently commented out so we can determine what best to do. Having this commented out breaks the Solr tests. Many of the ValueSources and DocValues in Solr could be moved to the module, but not all of them. Some have dependencies on Solr dependencies / Solr core code. Module isn't full integrated into the build.xmls and dev-tools. Lucene core's FunctionQuery stuff needs to be removed. I'll add a script that needs to be run before adding this patch shortly.
          Hide
          Chris Male added a comment - - edited

          I would expect, for example, to be able to add a feature that probably only made sense for solr (as long as it didn't prevent lucene users from using function queries in general of course).

          Does any of these features you allude to rely on stuff that thats in Solr core? This impacts the dependencies this module will have.

          Show
          Chris Male added a comment - - edited I would expect, for example, to be able to add a feature that probably only made sense for solr (as long as it didn't prevent lucene users from using function queries in general of course). Does any of these features you allude to rely on stuff that thats in Solr core? This impacts the dependencies this module will have.
          Hide
          Robert Muir added a comment -

          The "status" and expectations for the code are one of the big issues here - it's not just about the technical issues of what a patch looks like. I would expect, for example, to be able to add a feature that probably only made sense for solr (as long as it didn't prevent lucene users from using function queries in general of course). Is that acceptable to everyone? I would not necessarily have the same expectation for other things in modules - it really depends on what it is. Does that make sense?

          Do you have an example of what type of feature you are talking about? I think it would really clarify things.

          Show
          Robert Muir added a comment - The "status" and expectations for the code are one of the big issues here - it's not just about the technical issues of what a patch looks like. I would expect, for example, to be able to add a feature that probably only made sense for solr (as long as it didn't prevent lucene users from using function queries in general of course). Is that acceptable to everyone? I would not necessarily have the same expectation for other things in modules - it really depends on what it is. Does that make sense? Do you have an example of what type of feature you are talking about? I think it would really clarify things.
          Hide
          Grant Ingersoll added a comment -

          Does that make sense?

          I think it does. I think, as usual, it can be handled case by case. Just as Robert pointed out, some analysis things stayed in Solr, I don't think anyone is going to argue that we should put things where they most make sense. In general, though, I think the modules affords us a lot of opportunity through more exposure at the loss of a little bit of flexibility in Solr, which, as you said can still be handled by doing that little piece in just Solr.

          Show
          Grant Ingersoll added a comment - Does that make sense? I think it does. I think, as usual, it can be handled case by case. Just as Robert pointed out, some analysis things stayed in Solr, I don't think anyone is going to argue that we should put things where they most make sense. In general, though, I think the modules affords us a lot of opportunity through more exposure at the loss of a little bit of flexibility in Solr, which, as you said can still be handled by doing that little piece in just Solr.
          Hide
          Yonik Seeley added a comment -

          hopefully we didnt screw this up in anyway for modules/analysis, I don't think Solr lost anything here.

          Right, modules/analysis never concerned me since the filters moved there were nice and self contained, they only already relied on the lucene analysis interfaces, would easy to make a solr-specific copy of one if really needed in the future, and left some of the solr specific stuff where it was. It all made sense and was a win-win.

          We've hung a lot of features on function queries in Solr, and I've had plans for more to come - which is why it's not acceptable for it to suddenly not be a core part of Solr.

          its shared code to support both projects.

          The "status" and expectations for the code are one of the big issues here - it's not just about the technical issues of what a patch looks like. I would expect, for example, to be able to add a feature that probably only made sense for solr (as long as it didn't prevent lucene users from using function queries in general of course). Is that acceptable to everyone? I would not necessarily have the same expectation for other things in modules - it really depends on what it is. Does that make sense?

          Show
          Yonik Seeley added a comment - hopefully we didnt screw this up in anyway for modules/analysis, I don't think Solr lost anything here. Right, modules/analysis never concerned me since the filters moved there were nice and self contained, they only already relied on the lucene analysis interfaces, would easy to make a solr-specific copy of one if really needed in the future, and left some of the solr specific stuff where it was. It all made sense and was a win-win. We've hung a lot of features on function queries in Solr, and I've had plans for more to come - which is why it's not acceptable for it to suddenly not be a core part of Solr. its shared code to support both projects. The "status" and expectations for the code are one of the big issues here - it's not just about the technical issues of what a patch looks like. I would expect, for example, to be able to add a feature that probably only made sense for solr (as long as it didn't prevent lucene users from using function queries in general of course). Is that acceptable to everyone? I would not necessarily have the same expectation for other things in modules - it really depends on what it is. Does that make sense?
          Hide
          Michael McCandless added a comment -

          As such, I don't think it makes sense to mark it as lucene.internal, but experimental does make sense.

          +1

          Show
          Michael McCandless added a comment - As such, I don't think it makes sense to mark it as lucene.internal, but experimental does make sense. +1
          Hide
          Grant Ingersoll added a comment -

          I thought this was understood about modules anyway, that we move the code there, all committers

          have access and its shared code to support both projects.

          Cool. I was just re-iterating it. As such, I don't think it makes sense to mark it as lucene.internal, but experimental does make sense.

          Show
          Grant Ingersoll added a comment - I thought this was understood about modules anyway, that we move the code there, all committers have access and its shared code to support both projects. Cool. I was just re-iterating it. As such, I don't think it makes sense to mark it as lucene.internal, but experimental does make sense.
          Hide
          Robert Muir added a comment -

          but with the distinction that changes in modules need to account for Solr as well as Lucene.

          I thought this was understood about modules anyway, that we move the code there, all committers
          have access and its shared code to support both projects.

          hopefully we didnt screw this up in anyway for modules/analysis, I don't think Solr lost anything here.

          Separately, for the record, we didnt merge ALL of Solr's analysis stuff into modules/analysis.
          Some of this stuff really is solr-specific (e.g. ReverseWildCard support that needs some schema + qp integration).
          This really solr-specific stuff that isn't generally useful stayed in Solr.

          So I think we should always consider this for future modules, it doesn't have to be a binary decision,
          but instead what is general and makes sense.

          Show
          Robert Muir added a comment - but with the distinction that changes in modules need to account for Solr as well as Lucene. I thought this was understood about modules anyway, that we move the code there, all committers have access and its shared code to support both projects. hopefully we didnt screw this up in anyway for modules/analysis, I don't think Solr lost anything here. Separately, for the record, we didnt merge ALL of Solr's analysis stuff into modules/analysis. Some of this stuff really is solr-specific (e.g. ReverseWildCard support that needs some schema + qp integration). This really solr-specific stuff that isn't generally useful stayed in Solr. So I think we should always consider this for future modules, it doesn't have to be a binary decision, but instead what is general and makes sense.
          Hide
          Simon Willnauer added a comment -

          Function queries (FQs) already have a pretty stable interface and we have already worked hard to maintain BWC for them since they are one of the more common areas that people actually do extend in Solr. In other words, FQs are already more interface than implementation.

          YES! +1

          As a corollary of that, more light on them will make them better, improve performance, etc.

          again +1

          I think we can maintain the interface and innovations side by side - that said, lets look at a patch and discuss further once we see code / moving paths etc.

          simon

          Show
          Simon Willnauer added a comment - Function queries (FQs) already have a pretty stable interface and we have already worked hard to maintain BWC for them since they are one of the more common areas that people actually do extend in Solr. In other words, FQs are already more interface than implementation. YES! +1 As a corollary of that, more light on them will make them better, improve performance, etc. again +1 I think we can maintain the interface and innovations side by side - that said, lets look at a patch and discuss further once we see code / moving paths etc. simon
          Hide
          Grant Ingersoll added a comment -

          I think it makes sense to make function queries a module and expose them more to Lucene users for a number of reasons.

          First off, though, I get the concern about implementation detail versus interface, but I think the tradeoff is worth it for several reasons.

          • Function queries (FQs) already have a pretty stable interface and we have already worked hard to maintain BWC for them since they are one of the more common areas that people actually do extend in Solr. In other words, FQs are already more interface than implementation.
          • I've always been stumped as to why more Lucene users don't use them and instead re-invent the wheel. The fact that it has taken this long frankly mystifies me
          • As a corollary of that, more light on them will make them better, improve performance, etc.
          • I agree with most here that the ability to do this kind of thing is exactly why we merged.

          Hmm I don't really understand that part either. You're concerned that you'll lose the freedom to add functionality that'd benefit Solr only?

          I don't think that is the concern. I think the concern is that if one only has to worry about implementation details and not about backwards compat, public API, etc. then it is a lot easier to innovate. It's the equivalent of refactoring IndexWriter public methods versus the underlying DocumentsWriter. Remember, in Solr, 99% of the public API is in the Request parameters, not in Java interface. As long as we maintain BWC for the Request interface (i.e. &start=0&rows=10&q=foo), we are more or less free to do what we want in the implementation layer.

          In the end, I'm +1 on moving it, but with the distinction that changes in modules need to account for Solr as well as Lucene.

          Show
          Grant Ingersoll added a comment - I think it makes sense to make function queries a module and expose them more to Lucene users for a number of reasons. First off, though, I get the concern about implementation detail versus interface, but I think the tradeoff is worth it for several reasons. Function queries (FQs) already have a pretty stable interface and we have already worked hard to maintain BWC for them since they are one of the more common areas that people actually do extend in Solr. In other words, FQs are already more interface than implementation. I've always been stumped as to why more Lucene users don't use them and instead re-invent the wheel. The fact that it has taken this long frankly mystifies me As a corollary of that, more light on them will make them better, improve performance, etc. I agree with most here that the ability to do this kind of thing is exactly why we merged. Hmm I don't really understand that part either. You're concerned that you'll lose the freedom to add functionality that'd benefit Solr only? I don't think that is the concern. I think the concern is that if one only has to worry about implementation details and not about backwards compat, public API, etc. then it is a lot easier to innovate. It's the equivalent of refactoring IndexWriter public methods versus the underlying DocumentsWriter. Remember, in Solr, 99% of the public API is in the Request parameters, not in Java interface. As long as we maintain BWC for the Request interface (i.e. &start=0&rows=10&q=foo), we are more or less free to do what we want in the implementation layer. In the end, I'm +1 on moving it, but with the distinction that changes in modules need to account for Solr as well as Lucene.
          Hide
          Simon Willnauer added a comment -

          Really back in the days where stuff like that was moved to solr there was a good reason for duplicating the func - lucene was on a totally different release schedule, bw. compat etc. But now since we are aligned with releases and we have the modules area there is really no good reason for stuff like that. If solr needs some feature in FuncQuery & friends we are going to add it. If we can fix it without breaking BWCompat we should do it - if not we discuss case by case! I have actually written some FuncQuery related code for customers and they would not be happy if we break bwcompat codewise (I exclude package naming here) - just saying that keeping it internal so that we can hack it as we like is not an option anyway!!

          just my $0.05

          simon

          Show
          Simon Willnauer added a comment - Really back in the days where stuff like that was moved to solr there was a good reason for duplicating the func - lucene was on a totally different release schedule, bw. compat etc. But now since we are aligned with releases and we have the modules area there is really no good reason for stuff like that. If solr needs some feature in FuncQuery & friends we are going to add it. If we can fix it without breaking BWCompat we should do it - if not we discuss case by case! I have actually written some FuncQuery related code for customers and they would not be happy if we break bwcompat codewise (I exclude package naming here) - just saying that keeping it internal so that we can hack it as we like is not an option anyway!! just my $0.05 simon
          Hide
          Chris Male added a comment -

          Hi,

          Hmm I don't really understand that part either. You're concerned that you'll lose the freedom to add functionality that'd benefit Solr only?

          Show
          Chris Male added a comment - Hi, Hmm I don't really understand that part either. You're concerned that you'll lose the freedom to add functionality that'd benefit Solr only?
          Show
          Yonik Seeley added a comment - Chris: #2 is just an attempted recap of this comment: https://issues.apache.org/jira/browse/LUCENE-2883?focusedCommentId=12987146&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12987146
          Hide
          Chris Male added a comment -

          Hey Yonik,

          I'm just trying to understand what you mean by 2). How is Solr's support of function queries downgraded if they dont have the Solr package name?

          I can appreciate the need to be able to make changes easily but with function queries becoming so widely used in Solr, Solr users and probably Lucene users, I think having a stable interface is a good thing.

          Show
          Chris Male added a comment - Hey Yonik, I'm just trying to understand what you mean by 2). How is Solr's support of function queries downgraded if they dont have the Solr package name? I can appreciate the need to be able to make changes easily but with function queries becoming so widely used in Solr, Solr users and probably Lucene users, I think having a stable interface is a good thing.
          Hide
          Yonik Seeley added a comment - - edited

          Let me try to recap my concerns here (2 main ones):
          1) changing implementation to interface

          • a different back compat policy for the module (i.e. the java interfaces only) could partially address this.
          • marking as lucene.internal could mostly address, but that's prob not what direct module users would prefer.

          2) downgrading of solr support for this core piece of solr

          • making it a solr module (retaining the solr package naming, etc) should mostly handle this issue.
          Show
          Yonik Seeley added a comment - - edited Let me try to recap my concerns here (2 main ones): 1) changing implementation to interface a different back compat policy for the module (i.e. the java interfaces only) could partially address this. marking as lucene.internal could mostly address, but that's prob not what direct module users would prefer. 2) downgrading of solr support for this core piece of solr making it a solr module (retaining the solr package naming, etc) should mostly handle this issue.
          Hide
          Simon Willnauer added a comment -

          I'm not against modularizing some solr functionality so lucene users can use it, but hopefully you understand why I think it should retain the solr package namespace and be a solr module (or at a minimum perhaps a lucene/solr module). It just wouldn't be fair to downgrade solr support for such a large core piece of solr, and have people cry foul when something is added that only solr can benefit from (as is done fairly regularly today since it's just part of solr).

          I don't give a sh.. about the package name - /modules is shared space - some is more luceneish some is more solrish. I also think that bwcomap is not a super big deal with modules either. Why don't we get a proposal patch wise and see how that goes... Still - the concerns are super minor IMO and should not keep us from making progress on the modularization side of things.

          Show
          Simon Willnauer added a comment - I'm not against modularizing some solr functionality so lucene users can use it, but hopefully you understand why I think it should retain the solr package namespace and be a solr module (or at a minimum perhaps a lucene/solr module). It just wouldn't be fair to downgrade solr support for such a large core piece of solr, and have people cry foul when something is added that only solr can benefit from (as is done fairly regularly today since it's just part of solr). I don't give a sh.. about the package name - /modules is shared space - some is more luceneish some is more solrish. I also think that bwcomap is not a super big deal with modules either. Why don't we get a proposal patch wise and see how that goes... Still - the concerns are super minor IMO and should not keep us from making progress on the modularization side of things.
          Hide
          Yonik Seeley added a comment -

          One concern some people have had is that lucene not become too tailored just for solr - a very valid point, and I've tried hard not to add anything to lucene that's solr specific or that only solr would benefit from. Nevertheless, there have been a number of occasions (normally on IRC) where there has been disagreement/debate and I've had statements of the form "solr's not the only lucene user you know" type of arguments thrown up.

          I'm not against modularizing some solr functionality so lucene users can use it, but hopefully you understand why I think it should retain the solr package namespace and be a solr module (or at a minimum perhaps a lucene/solr module). It just wouldn't be fair to downgrade solr support for such a large core piece of solr, and have people cry foul when something is added that only solr can benefit from (as is done fairly regularly today since it's just part of solr).

          Show
          Yonik Seeley added a comment - One concern some people have had is that lucene not become too tailored just for solr - a very valid point, and I've tried hard not to add anything to lucene that's solr specific or that only solr would benefit from. Nevertheless, there have been a number of occasions (normally on IRC) where there has been disagreement/debate and I've had statements of the form "solr's not the only lucene user you know" type of arguments thrown up. I'm not against modularizing some solr functionality so lucene users can use it, but hopefully you understand why I think it should retain the solr package namespace and be a solr module (or at a minimum perhaps a lucene/solr module). It just wouldn't be fair to downgrade solr support for such a large core piece of solr, and have people cry foul when something is added that only solr can benefit from (as is done fairly regularly today since it's just part of solr).
          Hide
          Mark Miller added a comment -

          I'm trying to follow this thread. If I extract the somewhat off topic bitterness I seem to see:

          1. There is interest in trying to unify function queries again.

          2. There is a concern - currently the Solr function queries are part of Solr - java code in Solr is changed on a whim. In Lucene, the bar is higher - java code is generally considered an impl detail in Solr, unless it's a plugin api. This has been an attractive quality in the past. Moving it to Lucene changes it to more of a supported API. Valid point if you ask me.

          3. A response was made about marking it experimental.

          4. A response was made about perhaps internal.

          The rest does not add much. We have not voted on this issue - we voted on broad topics - to be sorted out as we go. Lets discuss it for a bit before everyone just gets pissed off.

          Show
          Mark Miller added a comment - I'm trying to follow this thread. If I extract the somewhat off topic bitterness I seem to see: 1. There is interest in trying to unify function queries again. 2. There is a concern - currently the Solr function queries are part of Solr - java code in Solr is changed on a whim. In Lucene, the bar is higher - java code is generally considered an impl detail in Solr, unless it's a plugin api. This has been an attractive quality in the past. Moving it to Lucene changes it to more of a supported API. Valid point if you ask me. 3. A response was made about marking it experimental. 4. A response was made about perhaps internal. The rest does not add much. We have not voted on this issue - we voted on broad topics - to be sorted out as we go. Lets discuss it for a bit before everyone just gets pissed off.
          Hide
          Michael Busch added a comment -

          Seriously, I don't want to have this discussion everything somebody comes up with a /modules suggestion - the is the way to go, we agreed and voted on it and it passed.

          +1. I agree with Simon and Robert.

          Show
          Michael Busch added a comment - Seriously, I don't want to have this discussion everything somebody comes up with a /modules suggestion - the is the way to go, we agreed and voted on it and it passed. +1. I agree with Simon and Robert.
          Hide
          Simon Willnauer added a comment -

          Can't we consolidate them under a new toplevel module? modules/queries?

          This was the entire plan for this issue.

          it should be a solr module... keep the solr package names and make it clear that it's primary purpose is supporting higher level features in solr

          Are you serious? If you need to change something or make additions to such an interface / abstract class or whatever you should be able to do this. But I am totally against it if you say this is you internal playground and If I decide to change it entirely I will do so. I start getting the impression that this anti-modularization thing has nothing todo with code and development.. Lots of software has been evolving even with bwcompat etc. but hey this is not core - deprecation and changes should not be super hard to do and we always decided case by case. if it was easy / doable with bw compat we did it - if not we broke it so what it the big deal again? Seriously, I don't want to have this discussion everything somebody comes up with a /modules suggestion - the is the way to go, we agreed and voted on it and it passed.

          Show
          Simon Willnauer added a comment - Can't we consolidate them under a new toplevel module? modules/queries? This was the entire plan for this issue. it should be a solr module... keep the solr package names and make it clear that it's primary purpose is supporting higher level features in solr Are you serious? If you need to change something or make additions to such an interface / abstract class or whatever you should be able to do this. But I am totally against it if you say this is you internal playground and If I decide to change it entirely I will do so. I start getting the impression that this anti-modularization thing has nothing todo with code and development.. Lots of software has been evolving even with bwcompat etc. but hey this is not core - deprecation and changes should not be super hard to do and we always decided case by case. if it was easy / doable with bw compat we did it - if not we broke it so what it the big deal again? Seriously, I don't want to have this discussion everything somebody comes up with a /modules suggestion - the is the way to go, we agreed and voted on it and it passed.
          Hide
          Yonik Seeley added a comment -

          We can mark the classes as lucene.experimental?

          If they remain experimental I suppose, but lucene.internal would be a more accurate description.

          Show
          Yonik Seeley added a comment - We can mark the classes as lucene.experimental? If they remain experimental I suppose, but lucene.internal would be a more accurate description.
          Hide
          Robert Muir added a comment -

          Wait, why again did we merge lucene and solr? This is crazy-talk.

          I don't see a single valid reason why queries should be in solr-only.

          Show
          Robert Muir added a comment - Wait, why again did we merge lucene and solr? This is crazy-talk. I don't see a single valid reason why queries should be in solr-only.
          Hide
          Yonik Seeley added a comment -

          Not sure if I communicated the issue clearly: taking what is essentially implementation and trying to make it interface clearly has a cost.
          Function queries and the solr qparser architecture are constantly evolving, and wind all through solr.

          If we attempt to make this easier to use by lucene users by moving it out to a module then:

          • it should be a solr module... keep the solr package names and make it clear that it's primary purpose is supporting higher level features in solr
          • we should make it such that java interface back compatibility is not a requirement, even for point releases

          The other approach is to make a Lucene function query module (actually, we already have that), try to update it with stuff from solr, but make it's primary purpose to support the Java interfaces.

          Show
          Yonik Seeley added a comment - Not sure if I communicated the issue clearly: taking what is essentially implementation and trying to make it interface clearly has a cost. Function queries and the solr qparser architecture are constantly evolving, and wind all through solr. If we attempt to make this easier to use by lucene users by moving it out to a module then: it should be a solr module... keep the solr package names and make it clear that it's primary purpose is supporting higher level features in solr we should make it such that java interface back compatibility is not a requirement, even for point releases The other approach is to make a Lucene function query module (actually, we already have that), try to update it with stuff from solr, but make it's primary purpose to support the Java interfaces.
          Hide
          Michael McCandless added a comment -

          Can't we consolidate them under a new toplevel module? modules/queries?

          We can mark the classes as lucene.experimental? Then we are free to iterate quickly. Does that address your concern Yonik?

          Show
          Michael McCandless added a comment - Can't we consolidate them under a new toplevel module? modules/queries? We can mark the classes as lucene.experimental? Then we are free to iterate quickly. Does that address your concern Yonik?
          Hide
          Simon Willnauer added a comment -

          One issue here is the different purposes for lucene and solr function queries.

          Yonik, if that is your only issue then we are good to go. I don't think that moving stuff to modules changes anything how we develop software. Modularization, decoupling, interfaces etc. you know how to work with those ey? so hey what is really the point here, this modularization is a key point of merging development with lucene and everytime somebody proposes something like this you fear that that monolithic thing under /solr could become more modular and decoupled. I don't know why this is the case but we should and will move on with modularization. Folks will use it once its there, thats for sure. Same is true for faceting, replication, queryparsers, functionparser... those are on the list!

          Show
          Simon Willnauer added a comment - One issue here is the different purposes for lucene and solr function queries. Yonik, if that is your only issue then we are good to go. I don't think that moving stuff to modules changes anything how we develop software. Modularization, decoupling, interfaces etc. you know how to work with those ey? so hey what is really the point here, this modularization is a key point of merging development with lucene and everytime somebody proposes something like this you fear that that monolithic thing under /solr could become more modular and decoupled. I don't know why this is the case but we should and will move on with modularization. Folks will use it once its there, thats for sure. Same is true for faceting, replication, queryparsers, functionparser... those are on the list!
          Hide
          Yonik Seeley added a comment -

          One issue here is the different purposes for lucene and solr function queries.
          Solr's function queries have always evolved at a rapid pace (and are continuing to evolve) to support higher level features and interfaces in Solr. They are able to evolve rapidly because they are seen more as an implementation detail rather than interface classes, and I'd hate to lose that. So if we do try to make Solr's function queries more accessible to lucene users (again), it should be as a Solr module. As we can see from history and usage, function queries are critically important to Solr, but are obviously not to Lucene.

          Show
          Yonik Seeley added a comment - One issue here is the different purposes for lucene and solr function queries. Solr's function queries have always evolved at a rapid pace (and are continuing to evolve) to support higher level features and interfaces in Solr. They are able to evolve rapidly because they are seen more as an implementation detail rather than interface classes, and I'd hate to lose that. So if we do try to make Solr's function queries more accessible to lucene users (again), it should be as a Solr module. As we can see from history and usage, function queries are critically important to Solr, but are obviously not to Lucene.

            People

            • Assignee:
              Chris Male
              Reporter:
              Simon Willnauer
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development