Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-1353

implement reusable token streams for all Solr tokenizers / token filters

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 1.4
    • Component/s: None
    • Labels:
      None

      Description

      The new lucene token architecture causes bad indexing performance if you don't happen to use reusable token streams.

      1. SOLR-1353.patch
        40 kB
        Yonik Seeley

        Activity

        Hide
        rcmuir Robert Muir added a comment -

        Yonik, at least in the case of analyzer class=xxx, I think many lucene contrib analyzers do not even implement reusableTokenStream... (so they are creating a new copy each time)!

        Show
        rcmuir Robert Muir added a comment - Yonik, at least in the case of analyzer class=xxx, I think many lucene contrib analyzers do not even implement reusableTokenStream... (so they are creating a new copy each time)!
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        Patch implementing reusable analyzers.
        Simple filters have been converted to use the new API.
        Complex filters such as synonym and WFD have not been converted.

        Show
        yseeley@gmail.com Yonik Seeley added a comment - Patch implementing reusable analyzers. Simple filters have been converted to use the new API. Complex filters such as synonym and WFD have not been converted.
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        Committed.

        Show
        yseeley@gmail.com Yonik Seeley added a comment - Committed.
        Hide
        rcmuir Robert Muir added a comment -

        seems to almost double throughput... how does this compare to pre-reflection etc... is it actually any faster?

        Show
        rcmuir Robert Muir added a comment - seems to almost double throughput... how does this compare to pre-reflection etc... is it actually any faster?
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        Yes, on my simple short field test, I got about a 90% increase in performance vs the pre-reflection (but still attribute based) code.
        I don't know how it compares to the code pre-attributes.

        Show
        yseeley@gmail.com Yonik Seeley added a comment - Yes, on my simple short field test, I got about a 90% increase in performance vs the pre-reflection (but still attribute based) code. I don't know how it compares to the code pre-attributes.
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        FYI, with all these changes, but with reuse turned off, I was seeing 10% slower performance than the pre-reflection code. Some of that performance impact could have been due to more mixing of old and new style APIs, or proper clearing of attributes, etc.

        Show
        yseeley@gmail.com Yonik Seeley added a comment - FYI, with all these changes, but with reuse turned off, I was seeing 10% slower performance than the pre-reflection code. Some of that performance impact could have been due to more mixing of old and new style APIs, or proper clearing of attributes, etc.
        Hide
        gsingers Grant Ingersoll added a comment -

        Bulk close Solr 1.4 issues

        Show
        gsingers Grant Ingersoll added a comment - Bulk close Solr 1.4 issues

          People

          • Assignee:
            yseeley@gmail.com Yonik Seeley
            Reporter:
            yseeley@gmail.com Yonik Seeley
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development