Lucene - Core
  1. Lucene - Core
  2. LUCENE-5426

Make SortedSetDocValuesReaderState customizable

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.6
    • Fix Version/s: 4.7, Trunk
    • Component/s: modules/facet
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      We have a reader that have a different data structure (in memory) where the cost of computing ordinals per reader open is too expensive in the realtime setting.

      We are maintaining in memory data structure that supports all functionality and would like to leverage SortedSetDocValuesAccumulator.

      1. sortedsetreaderstate.patch
        21 kB
        John Wang
      2. sortedsetreaderstate.patch
        18 kB
        John Wang

        Activity

        Hide
        John Wang added a comment -

        Thanks Michael! Can't wait for release of 4.7

        Show
        John Wang added a comment - Thanks Michael! Can't wait for release of 4.7
        Hide
        Michael McCandless added a comment -

        Thanks John!

        Show
        Michael McCandless added a comment - Thanks John!
        Hide
        ASF subversion and git services added a comment -

        Commit 1565168 from Michael McCandless in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1565168 ]

        LUCENE-5426: allow customization of SortedSetDocValuesReaderState for Lucene doc values faceting

        Show
        ASF subversion and git services added a comment - Commit 1565168 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1565168 ] LUCENE-5426 : allow customization of SortedSetDocValuesReaderState for Lucene doc values faceting
        Hide
        ASF subversion and git services added a comment -

        Commit 1565167 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1565167 ]

        LUCENE-5426: allow customization of SortedSetDocValuesReaderState for Lucene doc values faceting

        Show
        ASF subversion and git services added a comment - Commit 1565167 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1565167 ] LUCENE-5426 : allow customization of SortedSetDocValuesReaderState for Lucene doc values faceting
        Hide
        John Wang added a comment -

        Do you think this can make it to 4.7?

        Thanks

        -John

        Show
        John Wang added a comment - Do you think this can make it to 4.7? Thanks -John
        Hide
        Michael McCandless added a comment -

        +1, nice to make this "ord mapper" an extensions point / customizable.

        Show
        Michael McCandless added a comment - +1, nice to make this "ord mapper" an extensions point / customizable.
        Hide
        Shai Erera added a comment -

        Looks good to me. +1!

        Show
        Shai Erera added a comment - Looks good to me. +1!
        Hide
        John Wang added a comment -

        new patch diff'd against trunk.

        I incorporated Shai's comment in the patch as well.

        Show
        John Wang added a comment - new patch diff'd against trunk. I incorporated Shai's comment in the patch as well.
        Hide
        John Wang added a comment -

        This patch applies to 4.6 release branch. I will merge trunk and reapply patch.

        Show
        John Wang added a comment - This patch applies to 4.6 release branch. I will merge trunk and reapply patch.
        Hide
        Michael McCandless added a comment -

        It looks like this patch doesn't apply to 4.x; it applies pre LUCENE-5339?

        But +1 to allow custom impls of SSDVReaderState.

        Show
        Michael McCandless added a comment - It looks like this patch doesn't apply to 4.x; it applies pre LUCENE-5339 ? But +1 to allow custom impls of SSDVReaderState.
        Hide
        Shai Erera added a comment -

        I've got few questions:

        • Why is this code now in the accumulator:
        +    if (dv.getValueCount() > Integer.MAX_VALUE) {
        +      throw new IllegalArgumentException("can only handle valueCount < Integer.MAX_VALUE; got " + dv.getValueCount());
        +    }
        

        I see that it's still in DefaultSSDVReaderState, i.e. you cannot construct it if DV-count is more than Integer.MAX_VALUE. It also looks odd in the accumulator - it only uses it if the given FacetArrays are null?

        • Can you please make sure all the added getters are not called from inside loops, such as state.getIndexReader/separatorRegex?
        • Perhaps you should pull getSize() up to SSDVReaderState as well and use it instead of getDV().valueCount()? Just in case you can compute the size without obtaining the DV (i.e. lazy). Currently you're forced to pull a DV from the reader. If you do that, then please fix the Accumulator to use it too.

        Otherwise this looks good. The gist of this patch is that you made SSDVReaderState abstract (i.e. could have been an interface) and DefaultSSDVReaderState is the current concrete implementation, right?

        Show
        Shai Erera added a comment - I've got few questions: Why is this code now in the accumulator: + if (dv.getValueCount() > Integer .MAX_VALUE) { + throw new IllegalArgumentException( "can only handle valueCount < Integer .MAX_VALUE; got " + dv.getValueCount()); + } I see that it's still in DefaultSSDVReaderState, i.e. you cannot construct it if DV-count is more than Integer.MAX_VALUE. It also looks odd in the accumulator - it only uses it if the given FacetArrays are null ? Can you please make sure all the added getters are not called from inside loops, such as state.getIndexReader/separatorRegex? Perhaps you should pull getSize() up to SSDVReaderState as well and use it instead of getDV().valueCount()? Just in case you can compute the size without obtaining the DV (i.e. lazy). Currently you're forced to pull a DV from the reader. If you do that, then please fix the Accumulator to use it too. Otherwise this looks good. The gist of this patch is that you made SSDVReaderState abstract (i.e. could have been an interface) and DefaultSSDVReaderState is the current concrete implementation, right?
        Hide
        John Wang added a comment -

        You are right. Re-attached.

        Show
        John Wang added a comment - You are right. Re-attached.
        Hide
        Lei Wang added a comment -

        looks like the DefaultSortedSetDocsValuesReaderState.java is missing in the patch. forgot to attach?

        Show
        Lei Wang added a comment - looks like the DefaultSortedSetDocsValuesReaderState.java is missing in the patch. forgot to attach?

          People

          • Assignee:
            Michael McCandless
            Reporter:
            John Wang
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development