Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-821

single norm file still uses up descriptors

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      The new index file format with a single .nrm file for all norms does not decrease file descriptor usage.
      The .nrm file is opened once for each field with norms in the index segment.

      1. norms.patch
        4 kB
        Yonik Seeley

        Activity

        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        Attaching patch.

        Changes:

        • Only open a single IndexInput for the .nrm file and reuse it for all
          norms sharing that file in the segment. No clone is needed since
          only a single norm is read at any time.
        • close norms when no longer needed (idea from Michael)
        • removed extra IndexInput clone() when reading the norms... why was this there originally?
        Show
        yseeley@gmail.com Yonik Seeley added a comment - Attaching patch. Changes: Only open a single IndexInput for the .nrm file and reuse it for all norms sharing that file in the segment. No clone is needed since only a single norm is read at any time. close norms when no longer needed (idea from Michael) removed extra IndexInput clone() when reading the norms... why was this there originally?
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        > - removed extra IndexInput clone() when reading the norms... why was this there originally?

        Going back to the original SegmentReader implementation (2001), getNorms() wasn't synchronized, hence the clone of IndexInput was needed.

        Show
        yseeley@gmail.com Yonik Seeley added a comment - > - removed extra IndexInput clone() when reading the norms... why was this there originally? Going back to the original SegmentReader implementation (2001), getNorms() wasn't synchronized, hence the clone of IndexInput was needed.
        Hide
        mikemccand Michael McCandless added a comment -

        Whoa, you beat me to it: this was the next bug I was about to open!

        And I had worked through a patch, even setting in = null in the newly
        added Norm.close method

        And then also re-discovering that TestMultiSearcher closes its
        searcher and then keeps using it...

        Your patch looks perfect to me. +1

        I like the removal of the .clone() since norms(...) is synchronized.

        The one thing I was still trying to work out is if we could somehow
        close the singleNormStream once all Norm instances that share it had
        cached. It seems like a nice to have, but, since norms(...) is
        already synchronized we could have a simple refcount to track how many
        Norm instances still required it and then close the stream when that
        hits 0? This way we can free up 1 more file descriptor in certain
        cases.

        Show
        mikemccand Michael McCandless added a comment - Whoa, you beat me to it: this was the next bug I was about to open! And I had worked through a patch, even setting in = null in the newly added Norm.close method And then also re-discovering that TestMultiSearcher closes its searcher and then keeps using it... Your patch looks perfect to me. +1 I like the removal of the .clone() since norms(...) is synchronized. The one thing I was still trying to work out is if we could somehow close the singleNormStream once all Norm instances that share it had cached. It seems like a nice to have, but, since norms(...) is already synchronized we could have a simple refcount to track how many Norm instances still required it and then close the stream when that hits 0? This way we can free up 1 more file descriptor in certain cases.
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        Yeah, I considered refcounting the input, but I'm not sure that it's worth it.
        The reason is predictability... since you don't know what queries are coming, you don't know when that descriptor will be closed, and hence need to plan for the upper bound on descriptor usage anyway.

        Show
        yseeley@gmail.com Yonik Seeley added a comment - Yeah, I considered refcounting the input, but I'm not sure that it's worth it. The reason is predictability... since you don't know what queries are coming, you don't know when that descriptor will be closed, and hence need to plan for the upper bound on descriptor usage anyway.
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        committed. Thanks for the review Mike.

        Show
        yseeley@gmail.com Yonik Seeley added a comment - committed. Thanks for the review Mike.
        Hide
        mikemccand Michael McCandless added a comment -

        > The reason is predictability... since you don't know what queries are coming, you don't know when that descriptor will be closed, and hence need to plan for the upper bound on descriptor usage anyway.

        Agreed, the upper bound is what counts here, so it's not much help to reduce by one at some random later time.

        Show
        mikemccand Michael McCandless added a comment - > The reason is predictability... since you don't know what queries are coming, you don't know when that descriptor will be closed, and hence need to plan for the upper bound on descriptor usage anyway. Agreed, the upper bound is what counts here, so it's not much help to reduce by one at some random later time.
        Hide
        doronc Doron Cohen added a comment -

        wow.. I totally missed this point in 756. Fix here seems perfect to me too. Thanks for catching and fixing this Yonik.

        Show
        doronc Doron Cohen added a comment - wow.. I totally missed this point in 756. Fix here seems perfect to me too. Thanks for catching and fixing this Yonik.

          People

          • Assignee:
            yseeley@gmail.com Yonik Seeley
            Reporter:
            yseeley@gmail.com Yonik Seeley
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development