Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7283

Move SlowCompositeReaderWrapper and uninverting package to solr sources

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.1, master (7.0)
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Spinoff from LUCENE-6766, where we fixed index-time sorting to have first class support in Lucene's ore, and no longer use SlowCompositeReaderWrapper.

      This is a dangerous, long living class, that tries to pretend a set of N segments is actually just a single segment. It's a leaky abstraction, has poor performance, and puts undue pressure on the APIs of new Lucene features to try to keep up this illusion.

      With LUCENE-6766, finally all usage of this class (except for UninvertedReader tests, which should maybe also move out?) has been removed from Lucene, so I think we should move it to Solr. This may also lead to a solution for LUCENE-7086 since e.g. the class could tap into solr's schema to "know" how to handle points fields properly.

      1. LUCENE-7283.patch
        766 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          dsmiley David Smiley added a comment -

          Maybe it's worth asking on java-user@lucene.apache.org to get a sense from the community what they think? I have no idea if it's used or not outside of Solr. It's already outside core, which is good.

          Show
          dsmiley David Smiley added a comment - Maybe it's worth asking on java-user@lucene.apache.org to get a sense from the community what they think? I have no idea if it's used or not outside of Solr. It's already outside core, which is good.
          Hide
          jpountz Adrien Grand added a comment -

          I think it is fine to remove it from Lucene even if it is used in the wild. After all, our goal is that users stop using this class. If someone really needs the functionality, they can still fork the code?

          Show
          jpountz Adrien Grand added a comment - I think it is fine to remove it from Lucene even if it is used in the wild. After all, our goal is that users stop using this class. If someone really needs the functionality, they can still fork the code?
          Hide
          dsmiley David Smiley added a comment -

          Sure, +1. It's a simple class any way.

          Show
          dsmiley David Smiley added a comment - Sure, +1. It's a simple class any way.
          Hide
          mikemccand Michael McCandless added a comment -

          Here's a rote patch, moving SlowCompositeReaderWrapper and also the uninverted package from org.apache.lucene... to org.apache.solr....

          Show
          mikemccand Michael McCandless added a comment - Here's a rote patch, moving SlowCompositeReaderWrapper and also the uninverted package from org.apache.lucene... to org.apache.solr... .
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited

          +1, thanks Mike.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited +1, thanks Mike.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit baa5344e688faf1f45845d52ec236e165b519bda in lucene-solr's branch refs/heads/master from Mike McCandless
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=baa5344 ]

          LUCENE-7283: move SlowCompositeReaderWrapper and uninverting to solr

          Show
          jira-bot ASF subversion and git services added a comment - Commit baa5344e688faf1f45845d52ec236e165b519bda in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=baa5344 ] LUCENE-7283 : move SlowCompositeReaderWrapper and uninverting to solr
          Hide
          dsmiley David Smiley added a comment -

          Will this be master only or also 6x? I'm not sure what the Lucene user community might think about both UninvertingReader and SlowCompositeReaderWrapper capability disappearing on them in 6.1. Granted most consumers of Lucene are through ES or Solr but it would be a shame not to consider direct Lucene users. Neither of these classes are marked experimental.

          p.s. please update the description to include UninvertingReader (which is much larger in scope than just SCW and unrelated to SCW)

          Show
          dsmiley David Smiley added a comment - Will this be master only or also 6x? I'm not sure what the Lucene user community might think about both UninvertingReader and SlowCompositeReaderWrapper capability disappearing on them in 6.1. Granted most consumers of Lucene are through ES or Solr but it would be a shame not to consider direct Lucene users. Neither of these classes are marked experimental. p.s. please update the description to include UninvertingReader (which is much larger in scope than just SCW and unrelated to SCW)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 50572bf0b3ebcd4cfba03c1da3459da90106db68 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=50572bf ]

          LUCENE-7283: deprecate SlowCompositeReaderWrapper and uninverting package

          Show
          jira-bot ASF subversion and git services added a comment - Commit 50572bf0b3ebcd4cfba03c1da3459da90106db68 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=50572bf ] LUCENE-7283 : deprecate SlowCompositeReaderWrapper and uninverting package
          Hide
          mikemccand Michael McCandless added a comment -

          Will this be master only or also 6x?

          I just deprecated for 6.1.

          Granted most consumers of Lucene are through ES or Solr but it would be a shame not to consider direct Lucene users.

          I try hard to remember the Lucene direct users too!

          p.s. please update the description to include UninvertingReader (which is much larger in scope than just SCW and unrelated to SCW)

          OK will do.

          Show
          mikemccand Michael McCandless added a comment - Will this be master only or also 6x? I just deprecated for 6.1. Granted most consumers of Lucene are through ES or Solr but it would be a shame not to consider direct Lucene users. I try hard to remember the Lucene direct users too! p.s. please update the description to include UninvertingReader (which is much larger in scope than just SCW and unrelated to SCW) OK will do.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 5525f429288cf8480ae7b6dc1438918e809a242c in lucene-solr's branch refs/heads/branch_6x from Yonik Seeley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5525f42 ]

          SOLR-9160: Sync 6x and 7.0 move of UninvertingReader, SlowCompositeReaderWrapper for Solr (LUCENE-7283)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 5525f429288cf8480ae7b6dc1438918e809a242c in lucene-solr's branch refs/heads/branch_6x from Yonik Seeley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5525f42 ] SOLR-9160 : Sync 6x and 7.0 move of UninvertingReader, SlowCompositeReaderWrapper for Solr ( LUCENE-7283 )
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          p.s. please update the description to include UninvertingReader (which is much larger in scope than just SCW and unrelated to SCW)

          Indeed, the original JIRA description of "Move SlowCompositeReaderWrapper to solr sources" doesn't begin to describe the impact this issue will have on normal Lucene users in 7.0. I imagine they will be quite surprised to see everything about the FieldCache removed and left with no ability to Sort on an indexed-only field.

          It really seems like there should be a lot more consensus around removing functionality on this scale.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - p.s. please update the description to include UninvertingReader (which is much larger in scope than just SCW and unrelated to SCW) Indeed, the original JIRA description of "Move SlowCompositeReaderWrapper to solr sources" doesn't begin to describe the impact this issue will have on normal Lucene users in 7.0. I imagine they will be quite surprised to see everything about the FieldCache removed and left with no ability to Sort on an indexed-only field. It really seems like there should be a lot more consensus around removing functionality on this scale.
          Hide
          mikemccand Michael McCandless added a comment -

          I imagine they will be quite surprised to see everything about the FieldCache removed and left with no ability to Sort on an indexed-only field.

          I doubt that.

          In ancient Lucene versions, sorting was not a "first class citizen" and hacks like field cache were necessary.

          But then, long ago, we added doc values, and we've been saying for years now that if you want to sort by X you need to index X as doc values. This change (in 7.0 only, just deprecated in 6.x) should come as a surprise to nobody.

          Doc values are far more efficient than the the legacy field cache: no up front penalty the first time you sort on the field, no massive heap usage, no heavy full GC cost. Instead the OS manages the hot pages and swaps as necessary, thanks to efficient doc values codec formats.

          And such theoretical users have plenty of options: they can use Solr, or only use UninvertingReader or SlowCompositeReaderWrapper out of Solr's sources, or fork them for their own purposes, or reindex with the much-more-efficient doc values.

          Such horribly inefficient legacy code really does not belong in Lucene (or Solr!).

          Show
          mikemccand Michael McCandless added a comment - I imagine they will be quite surprised to see everything about the FieldCache removed and left with no ability to Sort on an indexed-only field. I doubt that. In ancient Lucene versions, sorting was not a "first class citizen" and hacks like field cache were necessary. But then, long ago, we added doc values, and we've been saying for years now that if you want to sort by X you need to index X as doc values. This change (in 7.0 only, just deprecated in 6.x) should come as a surprise to nobody. Doc values are far more efficient than the the legacy field cache: no up front penalty the first time you sort on the field, no massive heap usage, no heavy full GC cost. Instead the OS manages the hot pages and swaps as necessary, thanks to efficient doc values codec formats. And such theoretical users have plenty of options: they can use Solr, or only use UninvertingReader or SlowCompositeReaderWrapper out of Solr's sources, or fork them for their own purposes, or reindex with the much-more-efficient doc values. Such horribly inefficient legacy code really does not belong in Lucene (or Solr!).
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          In ancient Lucene versions, sorting was not a "first class citizen" and hacks like field cache were necessary.

          Yes, a hack that has been with us for more than 12 years already (FieldCache came about in 2004).

          Best practice is one thing (Solr uses docvalues by default), but removing the option is different.

          After a quick google, it doesn't appear like you're removing this ability from elasticsearch (fielddata is the ES equivalent of the Lucene FieldCache):
          https://www.elastic.co/guide/en/elasticsearch/reference/5.x/modules-fielddata.html
          https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html

          Show
          yseeley@gmail.com Yonik Seeley added a comment - In ancient Lucene versions, sorting was not a "first class citizen" and hacks like field cache were necessary. Yes, a hack that has been with us for more than 12 years already (FieldCache came about in 2004). Best practice is one thing (Solr uses docvalues by default), but removing the option is different. After a quick google, it doesn't appear like you're removing this ability from elasticsearch (fielddata is the ES equivalent of the Lucene FieldCache): https://www.elastic.co/guide/en/elasticsearch/reference/5.x/modules-fielddata.html https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html
          Hide
          jpountz Adrien Grand added a comment -

          On the other hand, doc values have been there for 4 years now. Like Solr, Elasticsearch has some history with FieldCache which makes the removal take time. But it is on the way out: the upcoming release based on Lucene 6 will only allow it on text fields, and as an opt-in (disabled by default) and all other fields (numerics, keywords, etc.) will enforce usage of doc values for sorting or faceting. And I truly hope that we can entirely get rid of it when we upgrade to Lucene 7. Now that we have had doc values for 4 years, I think it's really time to get rid of FieldCache. Apps that truly need it for a bit longer for backward compatibility guarantees can still fork the code like Mike suggested.

          Show
          jpountz Adrien Grand added a comment - On the other hand, doc values have been there for 4 years now. Like Solr, Elasticsearch has some history with FieldCache which makes the removal take time. But it is on the way out: the upcoming release based on Lucene 6 will only allow it on text fields, and as an opt-in (disabled by default) and all other fields (numerics, keywords, etc.) will enforce usage of doc values for sorting or faceting. And I truly hope that we can entirely get rid of it when we upgrade to Lucene 7. Now that we have had doc values for 4 years, I think it's really time to get rid of FieldCache. Apps that truly need it for a bit longer for backward compatibility guarantees can still fork the code like Mike suggested.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          I think it's really time to get rid of FieldCache

          I may have missed it, but I didn't see any mailing list discussion (or any other discussion) around this.
          The main takeaway here is a reminder that consensus around bigger decisions needs to be done in the open.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - I think it's really time to get rid of FieldCache I may have missed it, but I didn't see any mailing list discussion (or any other discussion) around this. The main takeaway here is a reminder that consensus around bigger decisions needs to be done in the open.

            People

            • Assignee:
              mikemccand Michael McCandless
              Reporter:
              mikemccand Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development