Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.9, 2.0.0, 2.1, 2.2
    • Fix Version/s: 2.3
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      The ISOLatin1AccentFilter is a bit slow giving 300+ ms responses when used in a highligher for output responses.

      Patch to follow

      1. fasterisoremove1.patch
        9 kB
        Mark Miller
      2. fasterisoremove2.patch
        9 kB
        Mark Miller
      3. ISOLatin1AccentFilter.java.patch
        5 kB
        Ian Boston
      4. ISOLatin1AccentFilterAlt.java
        7 kB
        Dawid Weiss
      5. LUCENE-871.take4.patch
        9 kB
        Michael McCandless

        Activity

        Hide
        ianeboston Ian Boston added a comment -

        Patch to improve performance,

        This was re-created form the improved source, and although I dont think I have made any transcription mistakes, I might have, please check that it builds and works. Ok.

        Show
        ianeboston Ian Boston added a comment - Patch to improve performance, This was re-created form the improved source, and although I dont think I have made any transcription mistakes, I might have, please check that it builds and works. Ok.
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        Patch looks fine after a quick review.

        To speed things up a bit more in the case that there are few accented chars in a large field, one could start the second loop at the place where the first loop found a char in the range of accented chars. The first part could be efficiently copied via String.getChars()

        Show
        yseeley@gmail.com Yonik Seeley added a comment - Patch looks fine after a quick review. To speed things up a bit more in the case that there are few accented chars in a large field, one could start the second loop at the place where the first loop found a char in the range of accented chars. The first part could be efficiently copied via String.getChars()
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        It would be a nice courtesy to document that this thing is over 2 times faster if you use a StringBuilder rather than a StringBuffer. Up to 2.8 times faster in my measurements.

        • Mark
        Show
        markrmiller@gmail.com Mark Miller added a comment - It would be a nice courtesy to document that this thing is over 2 times faster if you use a StringBuilder rather than a StringBuffer. Up to 2.8 times faster in my measurements. Mark
        Hide
        gsingers Grant Ingersoll added a comment -

        Right, but StringBuilder is 1.5. Sigh...

        Show
        gsingers Grant Ingersoll added a comment - Right, but StringBuilder is 1.5. Sigh...
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        Yes, I feel that sigh. So perhaps the point is moot. I was just thinking that a lot of code runs on 1.5 or > and while its obviously crazy to point out every class that could be changed to a StringBuilder, i thought it might be nice to warn somehow that this thing is like twice as fast if you can use StringBuilder. Even this sped up version takes more than its fair share in my filter chain.

        • Mark
        Show
        markrmiller@gmail.com Mark Miller added a comment - Yes, I feel that sigh. So perhaps the point is moot. I was just thinking that a lot of code runs on 1.5 or > and while its obviously crazy to point out every class that could be changed to a StringBuilder, i thought it might be nice to warn somehow that this thing is like twice as fast if you can use StringBuilder. Even this sped up version takes more than its fair share in my filter chain. Mark
        Hide
        mikemccand Michael McCandless added a comment -

        I think we can likely get a sizable speedup here by using the char[] termBuffer in Token plus the new "re-use single Token" API being discussed here:

        http://www.gossamer-threads.com/lists/lucene/java-dev/51283

        Show
        mikemccand Michael McCandless added a comment - I think we can likely get a sizable speedup here by using the char[] termBuffer in Token plus the new "re-use single Token" API being discussed here: http://www.gossamer-threads.com/lists/lucene/java-dev/51283
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        Here is a 1.4 solution using a char array. Doesn't yet use termbuffer or create only a single token.

        Almost twice as fast.

        • Mark
        Show
        markrmiller@gmail.com Mark Miller added a comment - Here is a 1.4 solution using a char array. Doesn't yet use termbuffer or create only a single token. Almost twice as fast. Mark
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        Woah! I thought I was seeing this patch on the trunk already. Should have looked more closely at the patch file. I basically just duplicated the patch. Belay all previous.

        Show
        markrmiller@gmail.com Mark Miller added a comment - Woah! I thought I was seeing this patch on the trunk already. Should have looked more closely at the patch file. I basically just duplicated the patch. Belay all previous.
        Hide
        markrmiller@gmail.com Mark Miller added a comment - - edited

        Since I prefer my char array handling to the current, I have taken the fail fast pre-scan loop and yoniks suggestion and incorporated them into my patch. Two choices now.

        • Mark
        Show
        markrmiller@gmail.com Mark Miller added a comment - - edited Since I prefer my char array handling to the current, I have taken the fail fast pre-scan loop and yoniks suggestion and incorporated them into my patch. Two choices now. Mark
        Hide
        mikemccand Michael McCandless added a comment -

        OK, for LUCENE-969 I made yet a 3rd option for optimizing
        ISOLatin1AccentFilter.

        In that patch I reuse the Token instance, using the char[] API for the
        Token's text instead of String, and I also re-use a single TokenStream
        instance (I did this for all core tokenizers).

        I just tested total time to tokenize all wikipedia content with
        current trunk (1116 sec) vs with LUCENE-969 (500 sec), with a
        WhitespaceTokenizer -> ISOLatin1AccentFilter chain.

        I separately timed just creating the documents at 112 sec, to subtract
        it off from the above times (so I can measure only cost of
        tokenization).

        This gives net speedup of this filter is 2.97X faster (1004 sec -> 388
        sec).

        Show
        mikemccand Michael McCandless added a comment - OK, for LUCENE-969 I made yet a 3rd option for optimizing ISOLatin1AccentFilter. In that patch I reuse the Token instance, using the char[] API for the Token's text instead of String, and I also re-use a single TokenStream instance (I did this for all core tokenizers). I just tested total time to tokenize all wikipedia content with current trunk (1116 sec) vs with LUCENE-969 (500 sec), with a WhitespaceTokenizer -> ISOLatin1AccentFilter chain. I separately timed just creating the documents at 112 sec, to subtract it off from the above times (so I can measure only cost of tokenization). This gives net speedup of this filter is 2.97X faster (1004 sec -> 388 sec).
        Hide
        mikemccand Michael McCandless added a comment -

        I think the changes that were part of LUCENE-969 (giving 2.97X speedup of ISOLatin1AccentFilter) has resolved this issue?

        Show
        mikemccand Michael McCandless added a comment - I think the changes that were part of LUCENE-969 (giving 2.97X speedup of ISOLatin1AccentFilter) has resolved this issue?
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        Pretty sure you still want this guys patch. It avoids a slow StringBuffer and has a fail fast check before switching each char against a rather large list. In the case that there are not that many accents it is much faster, and in either case the replacement of StringBuffer is much faster. Probably complements 969.

        Show
        markrmiller@gmail.com Mark Miller added a comment - Pretty sure you still want this guys patch. It avoids a slow StringBuffer and has a fail fast check before switching each char against a rather large list. In the case that there are not that many accents it is much faster, and in either case the replacement of StringBuffer is much faster. Probably complements 969.
        Hide
        mikemccand Michael McCandless added a comment -

        Ahh, OK, thanks. I will re-open and merge in the fail-fast part; I've already replaced StringBuffer with char[] as part of LUCENE-969.

        Show
        mikemccand Michael McCandless added a comment - Ahh, OK, thanks. I will re-open and merge in the fail-fast part; I've already replaced StringBuffer with char[] as part of LUCENE-969 .
        Hide
        mikemccand Michael McCandless added a comment -

        OK I merged the original patch with my commit from LUCENE-969. I
        plan to commit tomorrow. Thanks Ian!

        Show
        mikemccand Michael McCandless added a comment - OK I merged the original patch with my commit from LUCENE-969 . I plan to commit tomorrow. Thanks Ian!
        Hide
        ianeboston Ian Boston added a comment -

        My pleasure, for once to give something back to Lucene
        FYI, this is used for searching the texts of Charles Darwins letters at
        http://www.darwinproject.ac.uk/darwin/search/advanced , once again
        Lucene saved us 1000's of hours of work.
        Thank you.
        Ian

        Show
        ianeboston Ian Boston added a comment - My pleasure, for once to give something back to Lucene FYI, this is used for searching the texts of Charles Darwins letters at http://www.darwinproject.ac.uk/darwin/search/advanced , once again Lucene saved us 1000's of hours of work. Thank you. Ian
        Hide
        mikemccand Michael McCandless added a comment -

        I just committed this. Thanks Ian!

        Show
        mikemccand Michael McCandless added a comment - I just committed this. Thanks Ian!
        Hide
        stanislaw.osinski Stanislaw Osinski added a comment -

        One possible (and probably large) speed up for this code would be replacing the switch / case structure (which for most characters needs to be evaluated down to the "default", which is lots of comparisons) with a plain static char[65536][] table. The cost for this is roughly 130kB of memory, but the speed-up should be pretty good. If lots of people are using this filter and the memory cost is acceptable, later this week I can try to prepare another patch.

        Show
        stanislaw.osinski Stanislaw Osinski added a comment - One possible (and probably large) speed up for this code would be replacing the switch / case structure (which for most characters needs to be evaluated down to the "default", which is lots of comparisons) with a plain static char [65536] [] table. The cost for this is roughly 130kB of memory, but the speed-up should be pretty good. If lots of people are using this filter and the memory cost is acceptable, later this week I can try to prepare another patch.
        Hide
        klaasm Mike Klaas added a comment -

        The switch statement is not equivalent to a list of sequential ifelses--it is implemented as a O(1) branch already.

        Show
        klaasm Mike Klaas added a comment - The switch statement is not equivalent to a list of sequential ifelses--it is implemented as a O(1) branch already.
        Hide
        dweiss Dawid Weiss added a comment -

        Not exactly true, Mike. Switch statements are implemented as table lookups (tableswitch opcode) or binary sort switches (lookupswitch) – depending on the sparseness of keys used in the case clauses I'm sure, don't have the time to dig out my copy of the JVM spec right now.

        In any case, I'm pretty sure what Staszek says is correct and will be more efficient – I've had that in practice and doing manual lookup instead of a switch (with multiple branches, wide spectrum) was much faster (an order of magnitude in my case).

        Show
        dweiss Dawid Weiss added a comment - Not exactly true, Mike. Switch statements are implemented as table lookups (tableswitch opcode) or binary sort switches (lookupswitch) – depending on the sparseness of keys used in the case clauses I'm sure, don't have the time to dig out my copy of the JVM spec right now. In any case, I'm pretty sure what Staszek says is correct and will be more efficient – I've had that in practice and doing manual lookup instead of a switch (with multiple branches, wide spectrum) was much faster (an order of magnitude in my case).
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        It seems that any table based approach would need to take up 256 slots at most... one could get away with 64 too (0xff-0xc0+1).
        In any case, with normal latin1 text with an average amount of accents and a normal analyzer, does the current version really slow things down anymore?

        Dawid: a lookupswitch is O(1), no?

        Show
        yseeley@gmail.com Yonik Seeley added a comment - It seems that any table based approach would need to take up 256 slots at most... one could get away with 64 too (0xff-0xc0+1). In any case, with normal latin1 text with an average amount of accents and a normal analyzer, does the current version really slow things down anymore? Dawid: a lookupswitch is O(1), no?
        Hide
        dweiss Dawid Weiss added a comment -

        I guess it's a matter of just writing down two versions and comparing them against each other.

        Yonik: I don't think lookupswitch is O(1) – my belief is that lookupswitch is a binary search through the key set http://java.sun.com/docs/books/jvms/second_edition/html/Compiling.doc.html#14942. I guess you could make it O(1) with perfect hashing, for example, but if I were to guess it's not the reality. It's a good question though – I'll have a look at SUN's JVM sources in a spare moment.

        Show
        dweiss Dawid Weiss added a comment - I guess it's a matter of just writing down two versions and comparing them against each other. Yonik: I don't think lookupswitch is O(1) – my belief is that lookupswitch is a binary search through the key set http://java.sun.com/docs/books/jvms/second_edition/html/Compiling.doc.html#14942 . I guess you could make it O(1) with perfect hashing, for example, but if I were to guess it's not the reality. It's a good question though – I'll have a look at SUN's JVM sources in a spare moment.
        Hide
        stanislaw.osinski Stanislaw Osinski added a comment -

        I've just quickly decompiled the ISOLatin1AccentFilter.class from lucene 2.2.0 distribution – the switch statement got compiled into a tableswitch – the faster of the two options (but with larger code size).

        Yonik: from what I understand from the JVM spec, it is the tableswitch that is O(1) and lookupswitch should be at least O(log(N)) (binary search through N case values).

        Show
        stanislaw.osinski Stanislaw Osinski added a comment - I've just quickly decompiled the ISOLatin1AccentFilter.class from lucene 2.2.0 distribution – the switch statement got compiled into a tableswitch – the faster of the two options (but with larger code size). Yonik: from what I understand from the JVM spec, it is the tableswitch that is O(1) and lookupswitch should be at least O(log(N)) (binary search through N case values).
        Hide
        dweiss Dawid Weiss added a comment -

        Funny – I just did the same, but my compiler (Eclipse JDT) generated a lookupswitch from head –

        //* 52 93:iload 7
        {
        //* 53 95:lookupswitch 65: default 1275
        // 192: 624
        // 193: 624
        // 194: 624

        I also peeked at JVM's sources. Lookupswitches can be compiled (in hotspot mode) into two internal JVM instructions – fast_linearswitch or fast_binaryswitch. Depending on the selection you either get the O(1) acceleration or not. I still think it could be worthwhile to try to boost the performance of this method, although the gain might be minimal.

        Show
        dweiss Dawid Weiss added a comment - Funny – I just did the same, but my compiler (Eclipse JDT) generated a lookupswitch from head – //* 52 93:iload 7 { //* 53 95:lookupswitch 65: default 1275 // 192: 624 // 193: 624 // 194: 624 I also peeked at JVM's sources. Lookupswitches can be compiled (in hotspot mode) into two internal JVM instructions – fast_linearswitch or fast_binaryswitch. Depending on the selection you either get the O(1) acceleration or not. I still think it could be worthwhile to try to boost the performance of this method, although the gain might be minimal.
        Hide
        dweiss Dawid Weiss added a comment -

        To clarify: depending on the compiler/ hotspot you may get linear time (tableswitch), O(log(N)) (binarysearch) or O(N) (linearsearch). The exact checks which route is followed is beyond me, but the JVM sources are out there, so I guess it's possible to check.

        Show
        dweiss Dawid Weiss added a comment - To clarify: depending on the compiler/ hotspot you may get linear time (tableswitch), O(log(N)) (binarysearch) or O(N) (linearsearch). The exact checks which route is followed is beyond me, but the JVM sources are out there, so I guess it's possible to check.
        Hide
        dweiss Dawid Weiss added a comment -

        I was a bit curious about it, so I decided to write a table-lookup version. It does come out faster, but only by a small margin (especially on "server", hotspot JVMs).

        Timings are in milliseconds, the round consisted of 100000 repetitions of parsing the test string "Des mot clés À LA CHAÎNE À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Œ Þ Ù Ú Û Ü Ý Ÿ à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø œ ß þ ù ú û ü ý ÿ". Note it is biased since most characters do have accents, which will not be the case in real life I gues... but still:

        // SUN JVM build 1.6.0-b105, -server mode
        Round (old): 1922
        Round (old): 1688
        Round (old): 1656
        Round (old): 1687
        Round (old): 1641
        Round (old): 1703
        Round (old): 1672
        Round (old): 1672
        Round (old): 1687
        Round (old): 1719
        Round (new): 1719
        Round (new): 1609
        Round (new): 1609
        Round (new): 1594
        Round (new): 1625
        Round (new): 1578
        Round (new): 1625
        Round (new): 1594
        Round (new): 1625
        Round (new): 1656

        // SUN JVM, 1.6.0, interpreted (-client)

        Round (old): 2391
        Round (old): 2453
        Round (old): 2359
        Round (old): 2375
        Round (old): 2391
        Round (old): 2359
        Round (old): 2156
        Round (old): 2532
        Round (old): 2422
        Round (old): 2359
        Round (new): 1969
        Round (new): 1906
        Round (new): 1922
        Round (new): 1937
        Round (new): 1985
        Round (new): 1922
        Round (new): 1906
        Round (new): 1937
        Round (new): 1985
        Round (new): 1922

        // IBM JVM 1.5.0 (don't know why it's so sluggish, really).

        Round (old): 7906
        Round (old): 7188
        Round (old): 7625
        Round (old): 7312
        Round (old): 7266
        Round (old): 7141
        Round (old): 7015
        Round (old): 5641
        Round (old): 5578
        Round (old): 5672
        Round (new): 4656
        Round (new): 4406
        Round (new): 4516
        Round (new): 4516
        Round (new): 4375
        Round (new): 4375
        Round (new): 4343
        Round (new): 4297
        Round (new): 4344
        Round (new): 4266

        // IBM 1.5.0, -server (note the speed improvement when the old version is hotspot-optimized).

        Round (old): 5922
        Round (old): 5078
        Round (old): 5078
        Round (old): 5062
        Round (old): 4985
        Round (old): 4875
        Round (old): 4953
        Round (old): 4641
        Round (old): 3640
        Round (old): 3735
        Round (new): 3750
        Round (new): 3781
        Round (new): 3656
        Round (new): 3516
        Round (new): 3515
        Round (new): 3594
        Round (new): 3547
        Round (new): 3562
        Round (new): 3532
        Round (new): 3531

        So... it does come out a bit faster. Whether it makes sense to waste 130 kb of memory for this improvement.... don't know, really. I'll upload the table-lookup version for your reference.

        Show
        dweiss Dawid Weiss added a comment - I was a bit curious about it, so I decided to write a table-lookup version. It does come out faster, but only by a small margin (especially on "server", hotspot JVMs). Timings are in milliseconds, the round consisted of 100000 repetitions of parsing the test string "Des mot clés À LA CHAÎNE À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Œ Þ Ù Ú Û Ü Ý Ÿ à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø œ ß þ ù ú û ü ý ÿ". Note it is biased since most characters do have accents, which will not be the case in real life I gues... but still: // SUN JVM build 1.6.0-b105, -server mode Round (old): 1922 Round (old): 1688 Round (old): 1656 Round (old): 1687 Round (old): 1641 Round (old): 1703 Round (old): 1672 Round (old): 1672 Round (old): 1687 Round (old): 1719 Round (new): 1719 Round (new): 1609 Round (new): 1609 Round (new): 1594 Round (new): 1625 Round (new): 1578 Round (new): 1625 Round (new): 1594 Round (new): 1625 Round (new): 1656 // SUN JVM, 1.6.0, interpreted (-client) Round (old): 2391 Round (old): 2453 Round (old): 2359 Round (old): 2375 Round (old): 2391 Round (old): 2359 Round (old): 2156 Round (old): 2532 Round (old): 2422 Round (old): 2359 Round (new): 1969 Round (new): 1906 Round (new): 1922 Round (new): 1937 Round (new): 1985 Round (new): 1922 Round (new): 1906 Round (new): 1937 Round (new): 1985 Round (new): 1922 // IBM JVM 1.5.0 (don't know why it's so sluggish, really). Round (old): 7906 Round (old): 7188 Round (old): 7625 Round (old): 7312 Round (old): 7266 Round (old): 7141 Round (old): 7015 Round (old): 5641 Round (old): 5578 Round (old): 5672 Round (new): 4656 Round (new): 4406 Round (new): 4516 Round (new): 4516 Round (new): 4375 Round (new): 4375 Round (new): 4343 Round (new): 4297 Round (new): 4344 Round (new): 4266 // IBM 1.5.0, -server (note the speed improvement when the old version is hotspot-optimized). Round (old): 5922 Round (old): 5078 Round (old): 5078 Round (old): 5062 Round (old): 4985 Round (old): 4875 Round (old): 4953 Round (old): 4641 Round (old): 3640 Round (old): 3735 Round (new): 3750 Round (new): 3781 Round (new): 3656 Round (new): 3516 Round (new): 3515 Round (new): 3594 Round (new): 3547 Round (new): 3562 Round (new): 3532 Round (new): 3531 So... it does come out a bit faster. Whether it makes sense to waste 130 kb of memory for this improvement.... don't know, really. I'll upload the table-lookup version for your reference.
        Hide
        dweiss Dawid Weiss added a comment -

        A table-lookup version of ISO latin filter (this is not a patch, it's a renamed class so that you can compare the two).

        Show
        dweiss Dawid Weiss added a comment - A table-lookup version of ISO latin filter (this is not a patch, it's a renamed class so that you can compare the two).
        Hide
        ijuma Ismael Juma added a comment -

        I guess there's no better way than to verify the sources as Dawid did, but here's a post that also explains the behaviour of the -server HotSpot JIT when it comes to switch statements:

        http://forums.java.net/jive/message.jspa?messageID=204492#204492

        Show
        ijuma Ismael Juma added a comment - I guess there's no better way than to verify the sources as Dawid did, but here's a post that also explains the behaviour of the -server HotSpot JIT when it comes to switch statements: http://forums.java.net/jive/message.jspa?messageID=204492#204492

          People

          • Assignee:
            mikemccand Michael McCandless
            Reporter:
            ianeboston Ian Boston
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development