Lucene - Core
  1. Lucene - Core
  2. LUCENE-5157

Refactoring MultiDocValues.OrdinalMap to clarify API and internal structure.

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.6
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I refactored MultiDocValues.OrdinalMap, removing one unused parameter and renaming some methods to more clearly communicate what they do. Also I renamed subIndex references to segmentIndex.

        Activity

        Hide
        Michael McCandless added a comment -

        +1, these look like nice renamings!

        Show
        Michael McCandless added a comment - +1, these look like nice renamings!
        Hide
        Robert Muir added a comment -

        The reasoning for the current naming is because it doesnt know anything about segments: it just works like a MultiTermsEnum (it takes TermsEnum[]).

        Show
        Robert Muir added a comment - The reasoning for the current naming is because it doesnt know anything about segments: it just works like a MultiTermsEnum (it takes TermsEnum[]).
        Hide
        Robert Muir added a comment -

        I also dont think external APIs should reflect internal structure: a separation is usually considered a good thing.

        The current naming/parameterization was intentionally this way...

        Show
        Robert Muir added a comment - I also dont think external APIs should reflect internal structure: a separation is usually considered a good thing. The current naming/parameterization was intentionally this way...
        Hide
        Boaz Leskes added a comment -

        I agree with the principle however the current implementation was already using segments in the API. For example:

        public long getSegmentOrd(int subIndex, long globalOrd)

        In which I removed the first parameter which was not used and renamed it to:

        public long getFirstSegmentOrd(long globalOrd)

        this to communicate the segment of which you get the ordinal.

        Show
        Boaz Leskes added a comment - I agree with the principle however the current implementation was already using segments in the API. For example: public long getSegmentOrd(int subIndex, long globalOrd) In which I removed the first parameter which was not used and renamed it to: public long getFirstSegmentOrd(long globalOrd) this to communicate the segment of which you get the ordinal.
        Hide
        Adrien Grand added a comment -

        I like the getSegmentOrd change. Regarding the naming, I'm undecided but I think it's nice to use segment even if it is less generic, so that we don't have to use another word such as "index" to identify the TermsEnum, which could be confusing with "ord"?

        Show
        Adrien Grand added a comment - I like the getSegmentOrd change. Regarding the naming, I'm undecided but I think it's nice to use segment even if it is less generic, so that we don't have to use another word such as "index" to identify the TermsEnum, which could be confusing with "ord"?
        Hide
        Robert Muir added a comment -

        I can say for sure I'm not happy with the API And its bogus its inconsistent about subIndex/segment, so we should fix it one way or the other.

        A few things i hate about the API for the record:

        • should it really be in multitermsenum? if you have N-ord'enabled termsenums (regardless if they come from DV or not), you could use this thing. on the other hand the fact it uses termsenum at all could be seen as an impl detail of merging (sorted)set dictionaries.
        • i hate the 'object' owner and hairy slow-wrapper caching. on the other hand, I felt it would be a trap to be anywhere else, because it can have real cost and is not per-segment, so slow-wrapper feels like the right place.

        So my comment about subindex had more to do with the first problem: do we think it should be seen as a termsenum-merger or is it just a dv thing?

        Show
        Robert Muir added a comment - I can say for sure I'm not happy with the API And its bogus its inconsistent about subIndex/segment, so we should fix it one way or the other. A few things i hate about the API for the record: should it really be in multitermsenum? if you have N-ord'enabled termsenums (regardless if they come from DV or not), you could use this thing. on the other hand the fact it uses termsenum at all could be seen as an impl detail of merging (sorted)set dictionaries. i hate the 'object' owner and hairy slow-wrapper caching. on the other hand, I felt it would be a trap to be anywhere else, because it can have real cost and is not per-segment, so slow-wrapper feels like the right place. So my comment about subindex had more to do with the first problem: do we think it should be seen as a termsenum-merger or is it just a dv thing?
        Hide
        Adrien Grand added a comment -

        on the other hand the fact it uses termsenum at all could be seen as an impl detail of merging (sorted)set dictionaries

        I tend to see it this way since ordinals are more useful on doc values than on an inverted index's terms dictionary?

        Show
        Adrien Grand added a comment - on the other hand the fact it uses termsenum at all could be seen as an impl detail of merging (sorted)set dictionaries I tend to see it this way since ordinals are more useful on doc values than on an inverted index's terms dictionary?
        Hide
        Adrien Grand added a comment -

        I discussed about this issue with Robert to see how we can move forward:

        • moving OrdinalMap to MultiTermsEnum can be controversial as Robert explained so let's only tackle the naming and getSegmentOrd API issues here,
        • another option to make getSegmentOrd less trappy is to add an assertion that the provided segment number is the same as the one returned by getSegmentNumber, this would allow for returning the segment ordinals on any segment in the future without changing the API,
        • renaming subIndex to segment is ok as it makes the naming more consistent.

        Robert, please correct me if you think it doesn't reflect correctly what we said.
        Boaz, what do you think?

        Show
        Adrien Grand added a comment - I discussed about this issue with Robert to see how we can move forward: moving OrdinalMap to MultiTermsEnum can be controversial as Robert explained so let's only tackle the naming and getSegmentOrd API issues here, another option to make getSegmentOrd less trappy is to add an assertion that the provided segment number is the same as the one returned by getSegmentNumber , this would allow for returning the segment ordinals on any segment in the future without changing the API, renaming subIndex to segment is ok as it makes the naming more consistent. Robert, please correct me if you think it doesn't reflect correctly what we said. Boaz, what do you think?
        Hide
        Robert Muir added a comment -

        +1, lets improve it for now and not expand it to try to be a "general" termsenum merger. but on the other hand, i am still not convinced we can't improve the efficiency of this thing, so its good if we can prevent innards from being too exposed (unless its causing some use case an actual problem)

        Show
        Robert Muir added a comment - +1, lets improve it for now and not expand it to try to be a "general" termsenum merger. but on the other hand, i am still not convinced we can't improve the efficiency of this thing, so its good if we can prevent innards from being too exposed (unless its causing some use case an actual problem)
        Hide
        Boaz Leskes added a comment -

        Hi,

        Sorry for the delay. I think it's a good idea to separate the push for a more generic class. Until the point that is done and properly documented, I would opt for clear naming in the current context. The class is marked both internal and experimental, so backward compatibility issues shouldn't be a problem in the future (if I understand the policies correctly).

        If I understand the suggestion correctly, this is how the getSegmentOrd should look like:

            public long getSegmentOrd(int segment, long globalOrd) {
               assert segment == getFirstSegmentNumber(globalOrd);
               return globalOrd - globalOrdDeltas.get(globalOrd);
            }
        

        In my opinion this is not an ideal situation because the name and the signature suggest you can get a per segment/subindex ordinal for any segment only in runtime to fire an assertion (if enabled). My vote would go to communicate the functionality better in the name and not offer a segment parameter that is can only have one value.

        Show
        Boaz Leskes added a comment - Hi, Sorry for the delay. I think it's a good idea to separate the push for a more generic class. Until the point that is done and properly documented, I would opt for clear naming in the current context. The class is marked both internal and experimental, so backward compatibility issues shouldn't be a problem in the future (if I understand the policies correctly). If I understand the suggestion correctly, this is how the getSegmentOrd should look like: public long getSegmentOrd( int segment, long globalOrd) { assert segment == getFirstSegmentNumber(globalOrd); return globalOrd - globalOrdDeltas.get(globalOrd); } In my opinion this is not an ideal situation because the name and the signature suggest you can get a per segment/subindex ordinal for any segment only in runtime to fire an assertion (if enabled). My vote would go to communicate the functionality better in the name and not offer a segment parameter that is can only have one value.
        Hide
        Adrien Grand added a comment -

        This issue has been stalled for a few months now. Looking back at it, I think that the changes that Boaz proposes make the API easier to understand. It might be less general but this class is experimental so it will be possible to change the API again in the future is we want.

        I propose to commit the patch. If there are objections, I'll just close this issue as "Won't fix" and commit the suggested assertion in MultiOrdinals.getSegmentOrd.

        Show
        Adrien Grand added a comment - This issue has been stalled for a few months now. Looking back at it, I think that the changes that Boaz proposes make the API easier to understand. It might be less general but this class is experimental so it will be possible to change the API again in the future is we want. I propose to commit the patch. If there are objections, I'll just close this issue as "Won't fix" and commit the suggested assertion in MultiOrdinals.getSegmentOrd.
        Hide
        Robert Muir added a comment -

        I thought i gave a +1 above

        Not an objection just reiterating my previous hesitation: for example if I had that the time I would patch query-time join to use this class to support global ordinals across even different readers instead of huge hashmaps of terms, I think this would be much faster as its just an int<->int join.

        Then the segment number stuff might look wierd and the old API more intuitive, but this patch is fine for now!

        Show
        Robert Muir added a comment - I thought i gave a +1 above Not an objection just reiterating my previous hesitation: for example if I had that the time I would patch query-time join to use this class to support global ordinals across even different readers instead of huge hashmaps of terms, I think this would be much faster as its just an int<->int join. Then the segment number stuff might look wierd and the old API more intuitive, but this patch is fine for now!
        Hide
        ASF subversion and git services added a comment -

        Commit 1536605 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1536605 ]

        LUCENE-5157: Rename OrdinalMap methods to clarify API and internal structure.

        Show
        ASF subversion and git services added a comment - Commit 1536605 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1536605 ] LUCENE-5157 : Rename OrdinalMap methods to clarify API and internal structure.
        Hide
        ASF subversion and git services added a comment -

        Commit 1536607 from Adrien Grand in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1536607 ]

        LUCENE-5157: Rename OrdinalMap methods to clarify API and internal structure.

        Show
        ASF subversion and git services added a comment - Commit 1536607 from Adrien Grand in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1536607 ] LUCENE-5157 : Rename OrdinalMap methods to clarify API and internal structure.
        Hide
        Adrien Grand added a comment -

        I thought i gave a +1 above

        Oh, I thought the +1 was only for not making this class a generic termsenum merger!

        I just committed the patch, thanks Boaz!

        Show
        Adrien Grand added a comment - I thought i gave a +1 above Oh, I thought the +1 was only for not making this class a generic termsenum merger! I just committed the patch, thanks Boaz!

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Boaz Leskes
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development