Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 6.1, 7.0
    • None
    • None
    • New

    Description

      Spinoff from LUCENE-6766, where we fixed index-time sorting to have first class support in Lucene's ore, and no longer use SlowCompositeReaderWrapper.

      This is a dangerous, long living class, that tries to pretend a set of N segments is actually just a single segment. It's a leaky abstraction, has poor performance, and puts undue pressure on the APIs of new Lucene features to try to keep up this illusion.

      With LUCENE-6766, finally all usage of this class (except for UninvertedReader tests, which should maybe also move out?) has been removed from Lucene, so I think we should move it to Solr. This may also lead to a solution for LUCENE-7086 since e.g. the class could tap into solr's schema to "know" how to handle points fields properly.

      Attachments

        1. LUCENE-7283.patch
          766 kB
          Michael McCandless

        Issue Links

          Activity

            dsmiley David Smiley added a comment -

            Maybe it's worth asking on java-user@lucene.apache.org to get a sense from the community what they think? I have no idea if it's used or not outside of Solr. It's already outside core, which is good.

            dsmiley David Smiley added a comment - Maybe it's worth asking on java-user@lucene.apache.org to get a sense from the community what they think? I have no idea if it's used or not outside of Solr. It's already outside core, which is good.
            jpountz Adrien Grand added a comment -

            I think it is fine to remove it from Lucene even if it is used in the wild. After all, our goal is that users stop using this class. If someone really needs the functionality, they can still fork the code?

            jpountz Adrien Grand added a comment - I think it is fine to remove it from Lucene even if it is used in the wild. After all, our goal is that users stop using this class. If someone really needs the functionality, they can still fork the code?
            dsmiley David Smiley added a comment -

            Sure, +1. It's a simple class any way.

            dsmiley David Smiley added a comment - Sure, +1. It's a simple class any way.

            Here's a rote patch, moving SlowCompositeReaderWrapper and also the uninverted package from org.apache.lucene... to org.apache.solr....

            mikemccand Michael McCandless added a comment - Here's a rote patch, moving SlowCompositeReaderWrapper and also the uninverted package from org.apache.lucene... to org.apache.solr... .
            ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited

            +1, thanks Mike.

            ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited +1, thanks Mike.

            Commit baa5344e688faf1f45845d52ec236e165b519bda in lucene-solr's branch refs/heads/master from Mike McCandless
            [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=baa5344 ]

            LUCENE-7283: move SlowCompositeReaderWrapper and uninverting to solr

            jira-bot ASF subversion and git services added a comment - Commit baa5344e688faf1f45845d52ec236e165b519bda in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=baa5344 ] LUCENE-7283 : move SlowCompositeReaderWrapper and uninverting to solr
            dsmiley David Smiley added a comment -

            Will this be master only or also 6x? I'm not sure what the Lucene user community might think about both UninvertingReader and SlowCompositeReaderWrapper capability disappearing on them in 6.1. Granted most consumers of Lucene are through ES or Solr but it would be a shame not to consider direct Lucene users. Neither of these classes are marked experimental.

            p.s. please update the description to include UninvertingReader (which is much larger in scope than just SCW and unrelated to SCW)

            dsmiley David Smiley added a comment - Will this be master only or also 6x? I'm not sure what the Lucene user community might think about both UninvertingReader and SlowCompositeReaderWrapper capability disappearing on them in 6.1. Granted most consumers of Lucene are through ES or Solr but it would be a shame not to consider direct Lucene users. Neither of these classes are marked experimental. p.s. please update the description to include UninvertingReader (which is much larger in scope than just SCW and unrelated to SCW)

            Commit 50572bf0b3ebcd4cfba03c1da3459da90106db68 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
            [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=50572bf ]

            LUCENE-7283: deprecate SlowCompositeReaderWrapper and uninverting package

            jira-bot ASF subversion and git services added a comment - Commit 50572bf0b3ebcd4cfba03c1da3459da90106db68 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=50572bf ] LUCENE-7283 : deprecate SlowCompositeReaderWrapper and uninverting package

            Will this be master only or also 6x?

            I just deprecated for 6.1.

            Granted most consumers of Lucene are through ES or Solr but it would be a shame not to consider direct Lucene users.

            I try hard to remember the Lucene direct users too!

            p.s. please update the description to include UninvertingReader (which is much larger in scope than just SCW and unrelated to SCW)

            OK will do.

            mikemccand Michael McCandless added a comment - Will this be master only or also 6x? I just deprecated for 6.1. Granted most consumers of Lucene are through ES or Solr but it would be a shame not to consider direct Lucene users. I try hard to remember the Lucene direct users too! p.s. please update the description to include UninvertingReader (which is much larger in scope than just SCW and unrelated to SCW) OK will do.

            Commit 5525f429288cf8480ae7b6dc1438918e809a242c in lucene-solr's branch refs/heads/branch_6x from Yonik Seeley
            [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5525f42 ]

            SOLR-9160: Sync 6x and 7.0 move of UninvertingReader, SlowCompositeReaderWrapper for Solr (LUCENE-7283)

            jira-bot ASF subversion and git services added a comment - Commit 5525f429288cf8480ae7b6dc1438918e809a242c in lucene-solr's branch refs/heads/branch_6x from Yonik Seeley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5525f42 ] SOLR-9160 : Sync 6x and 7.0 move of UninvertingReader, SlowCompositeReaderWrapper for Solr ( LUCENE-7283 )
            yseeley@gmail.com Yonik Seeley added a comment -

            p.s. please update the description to include UninvertingReader (which is much larger in scope than just SCW and unrelated to SCW)

            Indeed, the original JIRA description of "Move SlowCompositeReaderWrapper to solr sources" doesn't begin to describe the impact this issue will have on normal Lucene users in 7.0. I imagine they will be quite surprised to see everything about the FieldCache removed and left with no ability to Sort on an indexed-only field.

            It really seems like there should be a lot more consensus around removing functionality on this scale.

            yseeley@gmail.com Yonik Seeley added a comment - p.s. please update the description to include UninvertingReader (which is much larger in scope than just SCW and unrelated to SCW) Indeed, the original JIRA description of "Move SlowCompositeReaderWrapper to solr sources" doesn't begin to describe the impact this issue will have on normal Lucene users in 7.0. I imagine they will be quite surprised to see everything about the FieldCache removed and left with no ability to Sort on an indexed-only field. It really seems like there should be a lot more consensus around removing functionality on this scale.

            I imagine they will be quite surprised to see everything about the FieldCache removed and left with no ability to Sort on an indexed-only field.

            I doubt that.

            In ancient Lucene versions, sorting was not a "first class citizen" and hacks like field cache were necessary.

            But then, long ago, we added doc values, and we've been saying for years now that if you want to sort by X you need to index X as doc values. This change (in 7.0 only, just deprecated in 6.x) should come as a surprise to nobody.

            Doc values are far more efficient than the the legacy field cache: no up front penalty the first time you sort on the field, no massive heap usage, no heavy full GC cost. Instead the OS manages the hot pages and swaps as necessary, thanks to efficient doc values codec formats.

            And such theoretical users have plenty of options: they can use Solr, or only use UninvertingReader or SlowCompositeReaderWrapper out of Solr's sources, or fork them for their own purposes, or reindex with the much-more-efficient doc values.

            Such horribly inefficient legacy code really does not belong in Lucene (or Solr!).

            mikemccand Michael McCandless added a comment - I imagine they will be quite surprised to see everything about the FieldCache removed and left with no ability to Sort on an indexed-only field. I doubt that. In ancient Lucene versions, sorting was not a "first class citizen" and hacks like field cache were necessary. But then, long ago, we added doc values, and we've been saying for years now that if you want to sort by X you need to index X as doc values. This change (in 7.0 only, just deprecated in 6.x) should come as a surprise to nobody. Doc values are far more efficient than the the legacy field cache: no up front penalty the first time you sort on the field, no massive heap usage, no heavy full GC cost. Instead the OS manages the hot pages and swaps as necessary, thanks to efficient doc values codec formats. And such theoretical users have plenty of options: they can use Solr, or only use UninvertingReader or SlowCompositeReaderWrapper out of Solr's sources, or fork them for their own purposes, or reindex with the much-more-efficient doc values. Such horribly inefficient legacy code really does not belong in Lucene (or Solr!).
            yseeley@gmail.com Yonik Seeley added a comment -

            In ancient Lucene versions, sorting was not a "first class citizen" and hacks like field cache were necessary.

            Yes, a hack that has been with us for more than 12 years already (FieldCache came about in 2004).

            Best practice is one thing (Solr uses docvalues by default), but removing the option is different.

            After a quick google, it doesn't appear like you're removing this ability from elasticsearch (fielddata is the ES equivalent of the Lucene FieldCache):
            https://www.elastic.co/guide/en/elasticsearch/reference/5.x/modules-fielddata.html
            https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html

            yseeley@gmail.com Yonik Seeley added a comment - In ancient Lucene versions, sorting was not a "first class citizen" and hacks like field cache were necessary. Yes, a hack that has been with us for more than 12 years already (FieldCache came about in 2004). Best practice is one thing (Solr uses docvalues by default), but removing the option is different. After a quick google, it doesn't appear like you're removing this ability from elasticsearch (fielddata is the ES equivalent of the Lucene FieldCache): https://www.elastic.co/guide/en/elasticsearch/reference/5.x/modules-fielddata.html https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html
            jpountz Adrien Grand added a comment -

            On the other hand, doc values have been there for 4 years now. Like Solr, Elasticsearch has some history with FieldCache which makes the removal take time. But it is on the way out: the upcoming release based on Lucene 6 will only allow it on text fields, and as an opt-in (disabled by default) and all other fields (numerics, keywords, etc.) will enforce usage of doc values for sorting or faceting. And I truly hope that we can entirely get rid of it when we upgrade to Lucene 7. Now that we have had doc values for 4 years, I think it's really time to get rid of FieldCache. Apps that truly need it for a bit longer for backward compatibility guarantees can still fork the code like Mike suggested.

            jpountz Adrien Grand added a comment - On the other hand, doc values have been there for 4 years now. Like Solr, Elasticsearch has some history with FieldCache which makes the removal take time. But it is on the way out: the upcoming release based on Lucene 6 will only allow it on text fields, and as an opt-in (disabled by default) and all other fields (numerics, keywords, etc.) will enforce usage of doc values for sorting or faceting. And I truly hope that we can entirely get rid of it when we upgrade to Lucene 7. Now that we have had doc values for 4 years, I think it's really time to get rid of FieldCache. Apps that truly need it for a bit longer for backward compatibility guarantees can still fork the code like Mike suggested.
            yseeley@gmail.com Yonik Seeley added a comment -

            I think it's really time to get rid of FieldCache

            I may have missed it, but I didn't see any mailing list discussion (or any other discussion) around this.
            The main takeaway here is a reminder that consensus around bigger decisions needs to be done in the open.

            yseeley@gmail.com Yonik Seeley added a comment - I think it's really time to get rid of FieldCache I may have missed it, but I didn't see any mailing list discussion (or any other discussion) around this. The main takeaway here is a reminder that consensus around bigger decisions needs to be done in the open.
            tomoko Tomoko Uchida added a comment -

            This issue was moved to GitHub issue: #8338.

            tomoko Tomoko Uchida added a comment - This issue was moved to GitHub issue: #8338 .

            People

              mikemccand Michael McCandless
              mikemccand Michael McCandless
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Slack

                  Issue deployment