Lucene - Core
  1. Lucene - Core
  2. LUCENE-1600

Reduce usage of String.intern(), performance is terrible

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4, 2.4.1
    • Fix Version/s: 2.9
    • Component/s: None
    • Labels:
      None
    • Environment:

      Windows Server 2003 x64
      Hotspot JDK 1.6.0_12 64-bit

    • Lucene Fields:
      New, Patch Available

      Description

      I profiled a simple MatchAllDocsQuery() against ~1.5 million documents (8 fields of short text, Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS), then retrieved all documents via searcher.doc(i, fs). String.intern() showed up as a top hotspot (see attached screenshot), so i implemented a small optimization to not intern() for every new Field(), instead forcing the intern in the FieldInfos class and adding a optional "internName" constructor to Field. This reduced execution time for searching and iterating through all documents by 35%. Results were similar for -server and -client.

      TRUNK (2.9) w/out patch: matched 1435563 in 8884 ms/search
      TRUNK (2.9) w/patch: matched 1435563 in 5786 ms/search

      1. intern_perf.patch
        3 kB
        Patrick Eger
      2. intern.png
        49 kB
        Patrick Eger

        Issue Links

          Activity

          Hide
          Patrick Eger added a comment -

          attaching profiler screenshot and patch

          Show
          Patrick Eger added a comment - attaching profiler screenshot and patch
          Hide
          Patrick Eger added a comment -

          note that there may be other opportunities to reduce interning, i fixed it only for my specific use-case

          Show
          Patrick Eger added a comment - note that there may be other opportunities to reduce interning, i fixed it only for my specific use-case
          Hide
          Michael McCandless added a comment -

          Thanks for the fix, P. I'll commit this for 2.9.

          Show
          Michael McCandless added a comment - Thanks for the fix, P. I'll commit this for 2.9.
          Hide
          Jason Rutherglen added a comment -

          contrib/MemoryIndex has a bunch of notes about how interning is
          slow, and using (I believe) hashmaps of strings is better.
          Comments on this approach?

          Show
          Jason Rutherglen added a comment - contrib/MemoryIndex has a bunch of notes about how interning is slow, and using (I believe) hashmaps of strings is better. Comments on this approach?
          Hide
          Patrick Eger added a comment -

          Hashmaps would work also, but then they either need to be synchronized or kept per-thread, the former would probably kill all your performance gains and the latter would be annoying i think. A moderate usage of String.intern() is fine i think, my patch just takes it out of the hot-path (for my use-case at least). Other uses of String.intern() in the codebase may have different solutions/tradeoffs certainly.

          Show
          Patrick Eger added a comment - Hashmaps would work also, but then they either need to be synchronized or kept per-thread, the former would probably kill all your performance gains and the latter would be annoying i think. A moderate usage of String.intern() is fine i think, my patch just takes it out of the hot-path (for my use-case at least). Other uses of String.intern() in the codebase may have different solutions/tradeoffs certainly.
          Hide
          Uwe Schindler added a comment - - edited

          In addition to Mikes fixes, there are more places in FieldsReader, where intern() is used. The best would be to add the sme ctor to AbstractField, too and use it for LayzyField and so on, too.
          If I have time, I attach a patch similar to Patrick's.

          Show
          Uwe Schindler added a comment - - edited In addition to Mikes fixes, there are more places in FieldsReader, where intern() is used. The best would be to add the sme ctor to AbstractField, too and use it for LayzyField and so on, too. If I have time, I attach a patch similar to Patrick's.

            People

            • Assignee:
              Michael McCandless
              Reporter:
              Patrick Eger
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development