Solr
  1. Solr
  2. SOLR-1353

implement reusable token streams for all Solr tokenizers / token filters

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 1.4
    • Component/s: None
    • Labels:
      None

      Description

      The new lucene token architecture causes bad indexing performance if you don't happen to use reusable token streams.

      1. SOLR-1353.patch
        40 kB
        Yonik Seeley

        Activity

        Hide
        Robert Muir added a comment -

        Yonik, at least in the case of analyzer class=xxx, I think many lucene contrib analyzers do not even implement reusableTokenStream... (so they are creating a new copy each time)!

        Show
        Robert Muir added a comment - Yonik, at least in the case of analyzer class=xxx, I think many lucene contrib analyzers do not even implement reusableTokenStream... (so they are creating a new copy each time)!
        Hide
        Yonik Seeley added a comment -

        Patch implementing reusable analyzers.
        Simple filters have been converted to use the new API.
        Complex filters such as synonym and WFD have not been converted.

        Show
        Yonik Seeley added a comment - Patch implementing reusable analyzers. Simple filters have been converted to use the new API. Complex filters such as synonym and WFD have not been converted.
        Hide
        Yonik Seeley added a comment -

        Committed.

        Show
        Yonik Seeley added a comment - Committed.
        Hide
        Robert Muir added a comment -

        seems to almost double throughput... how does this compare to pre-reflection etc... is it actually any faster?

        Show
        Robert Muir added a comment - seems to almost double throughput... how does this compare to pre-reflection etc... is it actually any faster?
        Hide
        Yonik Seeley added a comment -

        Yes, on my simple short field test, I got about a 90% increase in performance vs the pre-reflection (but still attribute based) code.
        I don't know how it compares to the code pre-attributes.

        Show
        Yonik Seeley added a comment - Yes, on my simple short field test, I got about a 90% increase in performance vs the pre-reflection (but still attribute based) code. I don't know how it compares to the code pre-attributes.
        Hide
        Yonik Seeley added a comment -

        FYI, with all these changes, but with reuse turned off, I was seeing 10% slower performance than the pre-reflection code. Some of that performance impact could have been due to more mixing of old and new style APIs, or proper clearing of attributes, etc.

        Show
        Yonik Seeley added a comment - FYI, with all these changes, but with reuse turned off, I was seeing 10% slower performance than the pre-reflection code. Some of that performance impact could have been due to more mixing of old and new style APIs, or proper clearing of attributes, etc.
        Hide
        Grant Ingersoll added a comment -

        Bulk close Solr 1.4 issues

        Show
        Grant Ingersoll added a comment - Bulk close Solr 1.4 issues

          People

          • Assignee:
            Yonik Seeley
            Reporter:
            Yonik Seeley
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development