Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      On segmentReader, liveDocs is volatile because they used to be loaded on demand.
      This no longer appears to be the case... liveDocs are always loaded up front.
      This also means that getLiveDocs() can never fail (even after close), and we can remove the ensureOpen call.
      Minor optimizations, but volatile reads still do prevent optimizations across that boundary.

      1. LUCENE-3500.patch
        0.9 kB
        Yonik Seeley

        Activity

        Hide
        Robert Muir added a comment -

        Where is getLiveDocs called in a tight loop?

        I can see this making sense for the old IR.isDeleted, but we shouldnt be calling getLiveDocs over and over again!

        Show
        Robert Muir added a comment - Where is getLiveDocs called in a tight loop? I can see this making sense for the old IR.isDeleted, but we shouldnt be calling getLiveDocs over and over again!
        Hide
        Yonik Seeley added a comment -

        Not a tight loop perhaps, but I've regularly seen fast changing indexes build up to 100 segments.
        And if Solr's DocSet filter implementations need to call it again to avoid useless wrapping, that's 200 times per query (and 400 volatile reads) that aren't really needed.
        Sure, it would be hard to measure the improvement (hence the "minor"), but it's more about the chipping away at inefficiencies. I guess I'd turn it around and say "why is this volatile?"... there's no reason for it to be.

        Show
        Yonik Seeley added a comment - Not a tight loop perhaps, but I've regularly seen fast changing indexes build up to 100 segments. And if Solr's DocSet filter implementations need to call it again to avoid useless wrapping, that's 200 times per query (and 400 volatile reads) that aren't really needed. Sure, it would be hard to measure the improvement (hence the "minor"), but it's more about the chipping away at inefficiencies. I guess I'd turn it around and say "why is this volatile?"... there's no reason for it to be.
        Hide
        Robert Muir added a comment -

        Not a tight loop perhaps, but I've regularly seen fast changing indexes build up to 100 segments.

        So in this fast changing case doesn't it make more sense for solr to cache the filters without deleted docs and not do the extra liveDocs check?

        but it's more about the chipping away at inefficiencies.

        well i think it makes some assumptions dependent upon the current implementation?

        Deleted docs are written to the index and like other things ideally codec could customize the impl in the future.

        thats the only problem i have with it.

        separately: we call this a lot less since LUCENE-3474 patch. Previously if you had a BQ of 100 terms, it would call it 100 times per segment.
        now we call it once per segment in that case.

        Show
        Robert Muir added a comment - Not a tight loop perhaps, but I've regularly seen fast changing indexes build up to 100 segments. So in this fast changing case doesn't it make more sense for solr to cache the filters without deleted docs and not do the extra liveDocs check? but it's more about the chipping away at inefficiencies. well i think it makes some assumptions dependent upon the current implementation? Deleted docs are written to the index and like other things ideally codec could customize the impl in the future. thats the only problem i have with it. separately: we call this a lot less since LUCENE-3474 patch. Previously if you had a BQ of 100 terms, it would call it 100 times per segment. now we call it once per segment in that case.
        Hide
        Yonik Seeley added a comment -

        well i think it makes some assumptions dependent upon the current implementation?

        But I am talking about the implementation.
        "volatile" is part of the SegmentReader implementation - a now unnecessary part since it's no longer loaded on demand as it was in the past.

        Show
        Yonik Seeley added a comment - well i think it makes some assumptions dependent upon the current implementation? But I am talking about the implementation. "volatile" is part of the SegmentReader implementation - a now unnecessary part since it's no longer loaded on demand as it was in the past.
        Hide
        Simon Willnauer added a comment -

        I am not sure if we can simply remove the ensureOpen() call here. If we close the reader and I use it afterwards to get del docs I could see a null value which doesn't reflect the actual del docs. Maybe we shouldn't null this out on close? I actually wonder if we can make this somehow final together with other members?

        Show
        Simon Willnauer added a comment - I am not sure if we can simply remove the ensureOpen() call here. If we close the reader and I use it afterwards to get del docs I could see a null value which doesn't reflect the actual del docs. Maybe we shouldn't null this out on close? I actually wonder if we can make this somehow final together with other members?

          People

          • Assignee:
            Unassigned
            Reporter:
            Yonik Seeley
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development