Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Currently this api is not very efficient, and always returns a mutable map.

      Can we change it to allow immutability? its sad we don't return Collections.emptyMap so its the same instance for all empty cases.

      1. LUCENE-6317.patch
        46 kB
        Robert Muir
      2. LUCENE-6317.patch
        43 kB
        Robert Muir
      3. LUCENE-6317.patch
        43 kB
        Robert Muir
      4. LUCENE-6317.patch
        40 kB
        Robert Muir
      5. LUCENE-6317.patch
        39 kB
        Robert Muir
      6. LUCENE-6317.patch
        9 kB
        Robert Muir

        Activity

        Hide
        rcmuir Robert Muir added a comment -

        I think we should deprecate the current readStringStringMap/readStringSet and define new ones that:

        • write the size as vint (int is stupid here)
        • always return immutable maps/sets. this just matches how lucene works.
        Show
        rcmuir Robert Muir added a comment - I think we should deprecate the current readStringStringMap/readStringSet and define new ones that: write the size as vint (int is stupid here) always return immutable maps/sets. this just matches how lucene works.
        Hide
        rcmuir Robert Muir added a comment -

        This means bumping formats of .si/.fis/segments_N, but I think its the right thing to do. Once a segment is read, the structures encoded in these maps should not be modified by anyone.

        Alternatively, we could keep the format the same, but then we leave the bloated 4 byte size, we change behavior in a scary way of an existing method, etc. To me this is not better, only easier.

        Show
        rcmuir Robert Muir added a comment - This means bumping formats of .si/.fis/segments_N, but I think its the right thing to do. Once a segment is read, the structures encoded in these maps should not be modified by anyone. Alternatively, we could keep the format the same, but then we leave the bloated 4 byte size, we change behavior in a scary way of an existing method, etc. To me this is not better, only easier.
        Hide
        rcmuir Robert Muir added a comment -

        Here are my proposed new methods and simple tests. They also remove the leniency of null parameters at write.

        If this is ok, I can work on fixing .si/.fis/.sis to use them.

        Show
        rcmuir Robert Muir added a comment - Here are my proposed new methods and simple tests. They also remove the leniency of null parameters at write. If this is ok, I can work on fixing .si/.fis/.sis to use them.
        Hide
        mikemccand Michael McCandless added a comment -

        +1 for this approach, thanks Rob.

        Show
        mikemccand Michael McCandless added a comment - +1 for this approach, thanks Rob.
        Hide
        rcmuir Robert Muir added a comment -

        Updated patch. I bumped segments_N, .fis, and .si to use the new methods.

        In general i tried to push immutability / non-nullness into the corresponding apis more too.

        I think this is safer and cleaner and helps with the memory issues Mike wants to tackle.

        Show
        rcmuir Robert Muir added a comment - Updated patch. I bumped segments_N, .fis, and .si to use the new methods. In general i tried to push immutability / non-nullness into the corresponding apis more too. I think this is safer and cleaner and helps with the memory issues Mike wants to tackle.
        Hide
        rcmuir Robert Muir added a comment -

        Just includes the fix for memoryindex. It was passing null where it should pass Collections.emptyMap()

        Show
        rcmuir Robert Muir added a comment - Just includes the fix for memoryindex. It was passing null where it should pass Collections.emptyMap()
        Hide
        mikemccand Michael McCandless added a comment -

        In Lucene50SegmentInfoFormat.java you can move the Collections.unmodifiableMap under the else clause of if (format >= VERSION_SAFE_MAPS) ? Can those other two maps be immutable?

        I see new HashMap<>() sometimes and Collections.emptyMap other times.

        Is there any way to assert the incoming maps to SI and FI are in fact immutable, besides trying to add something to them?

        Show
        mikemccand Michael McCandless added a comment - In Lucene50SegmentInfoFormat.java you can move the Collections.unmodifiableMap under the else clause of if (format >= VERSION_SAFE_MAPS) ? Can those other two maps be immutable? I see new HashMap<>() sometimes and Collections.emptyMap other times. Is there any way to assert the incoming maps to SI and FI are in fact immutable, besides trying to add something to them?
        Hide
        rcmuir Robert Muir added a comment -

        In Lucene50SegmentInfoFormat.java you can move the Collections.unmodifiableMap under the else clause of if (format >= VERSION_SAFE_MAPS) ? Can those other two maps be immutable?

        Yes, thank you!

        I see new HashMap<>() sometimes and Collections.emptyMap other times.
        Is there any way to assert the incoming maps to SI and FI are in fact immutable, besides trying to add something to them?

        That is because these classes (SI/FI) are mutable sometimes (e.g. when created by indexwriter), and other times immutable (when read by codec).
        This is not a problem I can solve underneath this issue.

        Show
        rcmuir Robert Muir added a comment - In Lucene50SegmentInfoFormat.java you can move the Collections.unmodifiableMap under the else clause of if (format >= VERSION_SAFE_MAPS) ? Can those other two maps be immutable? Yes, thank you! I see new HashMap<>() sometimes and Collections.emptyMap other times. Is there any way to assert the incoming maps to SI and FI are in fact immutable, besides trying to add something to them? That is because these classes (SI/FI) are mutable sometimes (e.g. when created by indexwriter), and other times immutable (when read by codec). This is not a problem I can solve underneath this issue.
        Hide
        rcmuir Robert Muir added a comment -

        Adds more safety for old .si in Lucene50SegmentInfoFormat.

        Show
        rcmuir Robert Muir added a comment - Adds more safety for old .si in Lucene50SegmentInfoFormat.
        Hide
        rcmuir Robert Muir added a comment -

        optimize the memory a bit more since we are immutable. Use treeset/map when the size is small to save ~64 bytes.

        Show
        rcmuir Robert Muir added a comment - optimize the memory a bit more since we are immutable. Use treeset/map when the size is small to save ~64 bytes.
        Hide
        mikemccand Michael McCandless added a comment -

        This is not a problem I can solve underneath this issue.

        OK good: out of scope!

        Last patch looks great: +1

        Thanks Rob.

        Show
        mikemccand Michael McCandless added a comment - This is not a problem I can solve underneath this issue. OK good: out of scope! Last patch looks great: +1 Thanks Rob.
        Hide
        rcmuir Robert Muir added a comment -

        added tests to BaseFieldInfoFormatTestCase and BaseSegmentInfoFormatTestCase that these maps are immutable.

        Show
        rcmuir Robert Muir added a comment - added tests to BaseFieldInfoFormatTestCase and BaseSegmentInfoFormatTestCase that these maps are immutable.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1663157 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1663157 ]

        LUCENE-6317: Fix readStringStringMap API, reduce memory usage

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1663157 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1663157 ] LUCENE-6317 : Fix readStringStringMap API, reduce memory usage
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1663194 from Robert Muir in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1663194 ]

        LUCENE-6317: Fix readStringStringMap API, reduce memory usage

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1663194 from Robert Muir in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1663194 ] LUCENE-6317 : Fix readStringStringMap API, reduce memory usage
        Hide
        thelabdude Timothy Potter added a comment -

        Bulk close after 5.1 release

        Show
        thelabdude Timothy Potter added a comment - Bulk close after 5.1 release

          People

          • Assignee:
            Unassigned
            Reporter:
            rcmuir Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development