Details

    • Type: New Feature
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.4
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Add numDeletedDocs to IndexReader. Basically, the implementation is as simple as doing:
      public int numDeletedDocs() {
      return deletedDocs == null ? 0 : deletedDocs.count();
      }
      in SegmentReader.
      Patch to follow to include in all IndexReader extensions.

        Activity

        Hide
        otis Otis Gospodnetic added a comment -

        Sending CHANGES.txt
        Sending src/java/org/apache/lucene/index/IndexReader.java
        Transmitting file data ..
        Committed revision 695510.

        Show
        otis Otis Gospodnetic added a comment - Sending CHANGES.txt Sending src/java/org/apache/lucene/index/IndexReader.java Transmitting file data .. Committed revision 695510.
        Hide
        shaie Shai Erera added a comment -

        I agree with the body, that's what I had in mind.

        As for extending classes, I agree that calling two methods has little performance overhead, but it just looks cleaner (for SegmentReader for example). Anyway, I don't have a strong opinion on whether we should override or not. I'll be fine with either.

        Show
        shaie Shai Erera added a comment - I agree with the body, that's what I had in mind. As for extending classes, I agree that calling two methods has little performance overhead, but it just looks cleaner (for SegmentReader for example). Anyway, I don't have a strong opinion on whether we should override or not. I'll be fine with either.
        Hide
        mikemccand Michael McCandless added a comment -

        What if we implement numDeletedDocs() in IndexReader, instead of defining it abstract?

        Right, that's exactly what I'm thinking, with this body:

        public int numDeletedDocs() {
          return maxDoc() - numDocs();
        }
        

        Then I think no classes need to override it (perf cost of calling 2 methods is tiny)?

        Show
        mikemccand Michael McCandless added a comment - What if we implement numDeletedDocs() in IndexReader, instead of defining it abstract? Right, that's exactly what I'm thinking, with this body: public int numDeletedDocs() { return maxDoc() - numDocs(); } Then I think no classes need to override it (perf cost of calling 2 methods is tiny)?
        Hide
        shaie Shai Erera added a comment -

        What if we implement numDeletedDocs() in IndexReader, instead of defining it abstract?
        Those that extend IndexReader (outside the scope of the attached patch) can then choose to override the implementation or not.

        The purpose of the patch is to add an explicit method which developers can use, rather than understand the logic on maxDoc() - numDocs(). Not all extended classes implement it this way BTW. SegmentReader just calls deletedDocs.count(), rather then calling the two separate methods.

        Show
        shaie Shai Erera added a comment - What if we implement numDeletedDocs() in IndexReader, instead of defining it abstract? Those that extend IndexReader (outside the scope of the attached patch) can then choose to override the implementation or not. The purpose of the patch is to add an explicit method which developers can use, rather than understand the logic on maxDoc() - numDocs(). Not all extended classes implement it this way BTW. SegmentReader just calls deletedDocs.count(), rather then calling the two separate methods.
        Hide
        mikemccand Michael McCandless added a comment -

        Hmm – this breaks back compat (adds new abstract method to IndexReader).

        Why don't we fallback to default impl, in IndexReader, of maxDoc() - numDocs()? Patch is much less invasive, and, we don't break back compat? maxDoc() is indeed cheap.

        Show
        mikemccand Michael McCandless added a comment - Hmm – this breaks back compat (adds new abstract method to IndexReader). Why don't we fallback to default impl, in IndexReader, of maxDoc() - numDocs()? Patch is much less invasive, and, we don't break back compat? maxDoc() is indeed cheap.
        Hide
        otis Otis Gospodnetic added a comment -

        I think so - applies and compiles.

        Show
        otis Otis Gospodnetic added a comment - I think so - applies and compiles.
        Hide
        mikemccand Michael McCandless added a comment -

        Otis is this one ready to go in?

        Show
        mikemccand Michael McCandless added a comment - Otis is this one ready to go in?
        Hide
        otis Otis Gospodnetic added a comment -

        I think maxDoc() is a cheap call, so calling it twice won't be a performance killer, esp. since this is not something you'd call frequently, I imagine.

        However, I do agree about numDeletedDocs() being nice for hiding implementation details.

        Show
        otis Otis Gospodnetic added a comment - I think maxDoc() is a cheap call, so calling it twice won't be a performance killer, esp. since this is not something you'd call frequently, I imagine. However, I do agree about numDeletedDocs() being nice for hiding implementation details.
        Hide
        shaie Shai Erera added a comment -

        A very simple patch that implements numDeletedDocs in all the necessary readers.

        Show
        shaie Shai Erera added a comment - A very simple patch that implements numDeletedDocs in all the necessary readers.
        Hide
        shaie Shai Erera added a comment -

        This is an option, however it will result in two calls to maxDoc() (once by maxDoc() and another by numDocs()).
        Like I wrote, it's more of a convenience method and having a complete and clear API. This way, users of Lucene won't need to ask themselves how to obtain this number - they'll have an explicit API for that.

        Show
        shaie Shai Erera added a comment - This is an option, however it will result in two calls to maxDoc() (once by maxDoc() and another by numDocs()). Like I wrote, it's more of a convenience method and having a complete and clear API. This way, users of Lucene won't need to ask themselves how to obtain this number - they'll have an explicit API for that.
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        How about just using maxDoc() - numDocs()?

        Show
        yseeley@gmail.com Yonik Seeley added a comment - How about just using maxDoc() - numDocs()?

          People

          • Assignee:
            otis Otis Gospodnetic
            Reporter:
            shaie Shai Erera
          • Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development