Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Currently this api is not very efficient, and always returns a mutable map.

      Can we change it to allow immutability? its sad we don't return Collections.emptyMap so its the same instance for all empty cases.

      1. LUCENE-6317.patch
        46 kB
        Robert Muir
      2. LUCENE-6317.patch
        43 kB
        Robert Muir
      3. LUCENE-6317.patch
        43 kB
        Robert Muir
      4. LUCENE-6317.patch
        40 kB
        Robert Muir
      5. LUCENE-6317.patch
        39 kB
        Robert Muir
      6. LUCENE-6317.patch
        9 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        I think we should deprecate the current readStringStringMap/readStringSet and define new ones that:

        • write the size as vint (int is stupid here)
        • always return immutable maps/sets. this just matches how lucene works.
        Show
        Robert Muir added a comment - I think we should deprecate the current readStringStringMap/readStringSet and define new ones that: write the size as vint (int is stupid here) always return immutable maps/sets. this just matches how lucene works.
        Hide
        Robert Muir added a comment -

        This means bumping formats of .si/.fis/segments_N, but I think its the right thing to do. Once a segment is read, the structures encoded in these maps should not be modified by anyone.

        Alternatively, we could keep the format the same, but then we leave the bloated 4 byte size, we change behavior in a scary way of an existing method, etc. To me this is not better, only easier.

        Show
        Robert Muir added a comment - This means bumping formats of .si/.fis/segments_N, but I think its the right thing to do. Once a segment is read, the structures encoded in these maps should not be modified by anyone. Alternatively, we could keep the format the same, but then we leave the bloated 4 byte size, we change behavior in a scary way of an existing method, etc. To me this is not better, only easier.
        Hide
        Robert Muir added a comment -

        Here are my proposed new methods and simple tests. They also remove the leniency of null parameters at write.

        If this is ok, I can work on fixing .si/.fis/.sis to use them.

        Show
        Robert Muir added a comment - Here are my proposed new methods and simple tests. They also remove the leniency of null parameters at write. If this is ok, I can work on fixing .si/.fis/.sis to use them.
        Hide
        Michael McCandless added a comment -

        +1 for this approach, thanks Rob.

        Show
        Michael McCandless added a comment - +1 for this approach, thanks Rob.
        Hide
        Robert Muir added a comment -

        Updated patch. I bumped segments_N, .fis, and .si to use the new methods.

        In general i tried to push immutability / non-nullness into the corresponding apis more too.

        I think this is safer and cleaner and helps with the memory issues Mike wants to tackle.

        Show
        Robert Muir added a comment - Updated patch. I bumped segments_N, .fis, and .si to use the new methods. In general i tried to push immutability / non-nullness into the corresponding apis more too. I think this is safer and cleaner and helps with the memory issues Mike wants to tackle.
        Hide
        Robert Muir added a comment -

        Just includes the fix for memoryindex. It was passing null where it should pass Collections.emptyMap()

        Show
        Robert Muir added a comment - Just includes the fix for memoryindex. It was passing null where it should pass Collections.emptyMap()
        Hide
        Michael McCandless added a comment -

        In Lucene50SegmentInfoFormat.java you can move the Collections.unmodifiableMap under the else clause of if (format >= VERSION_SAFE_MAPS) ? Can those other two maps be immutable?

        I see new HashMap<>() sometimes and Collections.emptyMap other times.

        Is there any way to assert the incoming maps to SI and FI are in fact immutable, besides trying to add something to them?

        Show
        Michael McCandless added a comment - In Lucene50SegmentInfoFormat.java you can move the Collections.unmodifiableMap under the else clause of if (format >= VERSION_SAFE_MAPS) ? Can those other two maps be immutable? I see new HashMap<>() sometimes and Collections.emptyMap other times. Is there any way to assert the incoming maps to SI and FI are in fact immutable, besides trying to add something to them?
        Hide
        Robert Muir added a comment -

        In Lucene50SegmentInfoFormat.java you can move the Collections.unmodifiableMap under the else clause of if (format >= VERSION_SAFE_MAPS) ? Can those other two maps be immutable?

        Yes, thank you!

        I see new HashMap<>() sometimes and Collections.emptyMap other times.
        Is there any way to assert the incoming maps to SI and FI are in fact immutable, besides trying to add something to them?

        That is because these classes (SI/FI) are mutable sometimes (e.g. when created by indexwriter), and other times immutable (when read by codec).
        This is not a problem I can solve underneath this issue.

        Show
        Robert Muir added a comment - In Lucene50SegmentInfoFormat.java you can move the Collections.unmodifiableMap under the else clause of if (format >= VERSION_SAFE_MAPS) ? Can those other two maps be immutable? Yes, thank you! I see new HashMap<>() sometimes and Collections.emptyMap other times. Is there any way to assert the incoming maps to SI and FI are in fact immutable, besides trying to add something to them? That is because these classes (SI/FI) are mutable sometimes (e.g. when created by indexwriter), and other times immutable (when read by codec). This is not a problem I can solve underneath this issue.
        Hide
        Robert Muir added a comment -

        Adds more safety for old .si in Lucene50SegmentInfoFormat.

        Show
        Robert Muir added a comment - Adds more safety for old .si in Lucene50SegmentInfoFormat.
        Hide
        Robert Muir added a comment -

        optimize the memory a bit more since we are immutable. Use treeset/map when the size is small to save ~64 bytes.

        Show
        Robert Muir added a comment - optimize the memory a bit more since we are immutable. Use treeset/map when the size is small to save ~64 bytes.
        Hide
        Michael McCandless added a comment -

        This is not a problem I can solve underneath this issue.

        OK good: out of scope!

        Last patch looks great: +1

        Thanks Rob.

        Show
        Michael McCandless added a comment - This is not a problem I can solve underneath this issue. OK good: out of scope! Last patch looks great: +1 Thanks Rob.
        Hide
        Robert Muir added a comment -

        added tests to BaseFieldInfoFormatTestCase and BaseSegmentInfoFormatTestCase that these maps are immutable.

        Show
        Robert Muir added a comment - added tests to BaseFieldInfoFormatTestCase and BaseSegmentInfoFormatTestCase that these maps are immutable.
        Hide
        ASF subversion and git services added a comment -

        Commit 1663157 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1663157 ]

        LUCENE-6317: Fix readStringStringMap API, reduce memory usage

        Show
        ASF subversion and git services added a comment - Commit 1663157 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1663157 ] LUCENE-6317 : Fix readStringStringMap API, reduce memory usage
        Hide
        ASF subversion and git services added a comment -

        Commit 1663194 from Robert Muir in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1663194 ]

        LUCENE-6317: Fix readStringStringMap API, reduce memory usage

        Show
        ASF subversion and git services added a comment - Commit 1663194 from Robert Muir in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1663194 ] LUCENE-6317 : Fix readStringStringMap API, reduce memory usage
        Hide
        Timothy Potter added a comment -

        Bulk close after 5.1 release

        Show
        Timothy Potter added a comment - Bulk close after 5.1 release

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development