Solr
  1. Solr
  2. SOLR-7765

TokenizerChain methods may return null depending on how constructor is called -- causes NPE in luke request handler

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 5.2.1
    • Fix Version/s: 5.3, 6.0
    • Component/s: None
    • Labels:
      None
    • Flags:
      Patch

      Description

      TokenizerChain created using 2-arg constructor has null in charFilters, so LukeRequestHandler throws NPE on iterating it.

      TokenizerChain constructor's should be hardened to do explicit null checks, throwing early NPE where appropriate (tokenizer factory), or initializing internal arrays to have 0 length when optional (factories for char filters and token filters)

      1. SOLR-7765.patch
        17 kB
        Hoss Man
      2. SOLR-7765.patch
        9 kB
        Hoss Man
      3. SOLR-7765.patch
        1 kB
        Hoss Man

        Activity

        Hide
        ASF GitHub Bot added a comment -

        GitHub user grossws opened a pull request:

        https://github.com/apache/lucene-solr/pull/185

        Fix NPE in LukeRequestHandler in getAnalyzerInfo

        NPE in thrown when luke tries to iterate `TokenizerChain#getCharFilterFactories()`
        which is null if 2-arg `TokenizerChain` constructor is used.

        Fixes SOLR-7765

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/grossws/lucene-solr fix-solr-7765

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/lucene-solr/pull/185.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #185


        commit 7a0f08915e73f0ef28d4b9af8bd613612ca39946
        Author: Konstantin Gribov <grossws@gmail.com>
        Date: 2015-07-08T16:30:04Z

        Fix NPE in LukeRequestHandler in getAnalyzerInfo

        NPE in thrown when luke tries to iterate `TokenizerChain#getCharFilterFactories()`
        which is null if 2-arg `TokenizerChain` constructor is used.

        Fixes SOLR-7765


        Show
        ASF GitHub Bot added a comment - GitHub user grossws opened a pull request: https://github.com/apache/lucene-solr/pull/185 Fix NPE in LukeRequestHandler in getAnalyzerInfo NPE in thrown when luke tries to iterate `TokenizerChain#getCharFilterFactories()` which is null if 2-arg `TokenizerChain` constructor is used. Fixes SOLR-7765 You can merge this pull request into a Git repository by running: $ git pull https://github.com/grossws/lucene-solr fix-solr-7765 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/185.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #185 commit 7a0f08915e73f0ef28d4b9af8bd613612ca39946 Author: Konstantin Gribov <grossws@gmail.com> Date: 2015-07-08T16:30:04Z Fix NPE in LukeRequestHandler in getAnalyzerInfo NPE in thrown when luke tries to iterate `TokenizerChain#getCharFilterFactories()` which is null if 2-arg `TokenizerChain` constructor is used. Fixes SOLR-7765
        Hide
        Hoss Man added a comment -

        Konstantin: i tried to whip up a quick patch to demonstrate the bug you were fixing (we always like to have test cases to help minimize the chance of regression in bug fixes) but i couldn't reproduce the problem (ie: the test passed)

        can you please elaborate on the steps to reproduce in order to trigger an NPE in the Luke handler?

        Show
        Hoss Man added a comment - Konstantin: i tried to whip up a quick patch to demonstrate the bug you were fixing (we always like to have test cases to help minimize the chance of regression in bug fixes) but i couldn't reproduce the problem (ie: the test passed) can you please elaborate on the steps to reproduce in order to trigger an NPE in the Luke handler?
        Hide
        Konstantin Gribov added a comment -

        I am not familiar with lucene-solr test suite. It seems that TokenizerChain was created standard way from schema (via FieldTypePluginLoader, line 403@lucene_solr_5_2_1), so it was created with

        new TokenizerChain(charFilters.toArray(new CharFilterFactory[charFilters.size()]), tokenizers.get(0), filters.toArray(new TokenFilterFactory[filters.size()]));
        

        which is three-arg constructor with non-null first argument (it will be empty array when charFilters is empty).

        I'll add a test to my PR.

        Show
        Konstantin Gribov added a comment - I am not familiar with lucene-solr test suite. It seems that TokenizerChain was created standard way from schema (via FieldTypePluginLoader , line 403@lucene_solr_5_2_1), so it was created with new TokenizerChain(charFilters.toArray( new CharFilterFactory[charFilters.size()]), tokenizers.get(0), filters.toArray( new TokenFilterFactory[filters.size()])); which is three-arg constructor with non-null first argument (it will be empty array when charFilters is empty). I'll add a test to my PR.
        Hide
        Konstantin Gribov added a comment -

        Updated PR with test case

        Show
        Konstantin Gribov added a comment - Updated PR with test case
        Hide
        Hoss Man added a comment -

        I'll add a test to my PR.

        Thanks!

        I missunderstood what you ment before, but with the testcase you provided it all makes sense.

        In my opinion, the root bug here is that TokenizerChain should be more explicit about what is allowed in it's construtor, and more resilient to null args when things are optional – that way callers like LukeAdminHandler don't have to constantly do null checks.

        The attached path fixes what i consider the root of the bug and gets your test to pass w/o modifying LukeAdminHandler. It also adds more randomization to your test to cover more permutations of options, and updates MultiTermTest to account for the improved behavior of getCharFilterFactories() (which you can see from looking at that test was annoying inconsistent before depending on what analyzer was used and where it came from)

        I'm going to do a quick audit of all TokenizerChain clients to see where else null checks are currently be doing that can be optimized away with this fix and post an updated patch.

        Show
        Hoss Man added a comment - I'll add a test to my PR. Thanks! I missunderstood what you ment before, but with the testcase you provided it all makes sense. In my opinion, the root bug here is that TokenizerChain should be more explicit about what is allowed in it's construtor, and more resilient to null args when things are optional – that way callers like LukeAdminHandler don't have to constantly do null checks. The attached path fixes what i consider the root of the bug and gets your test to pass w/o modifying LukeAdminHandler. It also adds more randomization to your test to cover more permutations of options, and updates MultiTermTest to account for the improved behavior of getCharFilterFactories() (which you can see from looking at that test was annoying inconsistent before depending on what analyzer was used and where it came from) I'm going to do a quick audit of all TokenizerChain clients to see where else null checks are currently be doing that can be optimized away with this fix and post an updated patch.
        Hide
        Hoss Man added a comment -

        I'm going to do a quick audit of all TokenizerChain clients to see where else null checks are currently be doing that can be optimized away with this fix and post an updated patch.

        attached.

        Show
        Hoss Man added a comment - I'm going to do a quick audit of all TokenizerChain clients to see where else null checks are currently be doing that can be optimized away with this fix and post an updated patch. attached.
        Hide
        Konstantin Gribov added a comment -

        Fixing this in TokenizerChain itself seems better to me.

        Is there any reason to use reversed order in ifs (like if (0 < cfiltfacs.length))? It looks quite strange since most of lucene & solr code use more usual if (var > 0) style.

        Show
        Konstantin Gribov added a comment - Fixing this in TokenizerChain itself seems better to me. Is there any reason to use reversed order in ifs (like if (0 < cfiltfacs.length) )? It looks quite strange since most of lucene & solr code use more usual if (var > 0) style.
        Hide
        ASF subversion and git services added a comment -

        Commit 1692170 from hossman@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1692170 ]

        SOLR-7765: Hardened the behavior of TokenizerChain when null arguments are used in constructor. This prevents NPEs in some code paths.

        Show
        ASF subversion and git services added a comment - Commit 1692170 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1692170 ] SOLR-7765 : Hardened the behavior of TokenizerChain when null arguments are used in constructor. This prevents NPEs in some code paths.
        Hide
        ASF subversion and git services added a comment -

        Commit 1692174 from hossman@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1692174 ]

        SOLR-7765: Hardened the behavior of TokenizerChain when null arguments are used in constructor. This prevents NPEs in some code paths. (merge r1692170)

        Show
        ASF subversion and git services added a comment - Commit 1692174 from hossman@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1692174 ] SOLR-7765 : Hardened the behavior of TokenizerChain when null arguments are used in constructor. This prevents NPEs in some code paths. (merge r1692170)
        Hide
        Hoss Man added a comment -

        Thanks for rasiing this issue and helping out with the tests Konstantin,

        Is there any reason to use reversed order in ifs...

        It's just a defensive coding habbit i developed from doing a lot of C and perl coding in my early years as a developer ... trained my brain to default to putting constants on the left in comparisons in cause i misstype "=" instead of "==" or "<="

        Show
        Hoss Man added a comment - Thanks for rasiing this issue and helping out with the tests Konstantin, Is there any reason to use reversed order in ifs... It's just a defensive coding habbit i developed from doing a lot of C and perl coding in my early years as a developer ... trained my brain to default to putting constants on the left in comparisons in cause i misstype "=" instead of "==" or "<="
        Hide
        Konstantin Gribov added a comment -

        I understand you point, I write in C time to time. My point is that it leads to inconsistency in code style without any profit (like eliminating error-prone constructs in C). It makes no sense in java, since for types other than boolean it will produce compile-time error (because type incompatibility) and for boolean ifs will look like if (val) or if (!val). It seems that there's no place to make this types of error in java.

        Show
        Konstantin Gribov added a comment - I understand you point, I write in C time to time. My point is that it leads to inconsistency in code style without any profit (like eliminating error-prone constructs in C). It makes no sense in java, since for types other than boolean it will produce compile-time error (because type incompatibility) and for boolean ifs will look like if (val) or if (!val) . It seems that there's no place to make this types of error in java.
        Hide
        ASF GitHub Bot added a comment -

        Github user grossws closed the pull request at:

        https://github.com/apache/lucene-solr/pull/185

        Show
        ASF GitHub Bot added a comment - Github user grossws closed the pull request at: https://github.com/apache/lucene-solr/pull/185
        Hide
        ASF GitHub Bot added a comment -

        Github user grossws commented on the pull request:

        https://github.com/apache/lucene-solr/pull/185#issuecomment-124117829

        Fixed by @hossman in SOLR-7765

        Show
        ASF GitHub Bot added a comment - Github user grossws commented on the pull request: https://github.com/apache/lucene-solr/pull/185#issuecomment-124117829 Fixed by @hossman in SOLR-7765
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Hoss Man
            Reporter:
            Konstantin Gribov
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development