Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8725

Cores, collections, and shards should accept hyphens in identifier name

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 5.5
    • Fix Version/s: 5.5.1, 6.0, 7.0
    • Component/s: None
    • Labels:
      None

      Description

      In SOLR-8642, hyphens are no longer considered valid identifiers for cores (and collections?). Our solr instance was successfully using hyphens in our core names, and our affected cores now error with:

      marc-profiler_shard1_replica1: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Invalid name: 'marc-profiler_shard1_replica1' Identifiers must consist entirely of periods, underscores and alphanumerics

      Before starting to rename all of our collections, I wonder if this decision could be revisited to be backwards compatible with previously created collections.

      1. SOLR-8725.patch
        19 kB
        Anshum Gupta
      2. SOLR-8725.patch
        0.8 kB
        Anshum Gupta
      3. SOLR-8725-5_5.patch
        2 kB
        Shawn Heisey
      4. SOLR-8725-fix-regex.patch
        2 kB
        Anshum Gupta

        Issue Links

          Activity

          Hide
          janhoy Jan Høydahl added a comment -

          It makes sense to allow hyphens, but not as first character of the name. In SOLR-8110 we discuss rules for all kind of identifiers for 6.x and so far there seems to be consensus to allow both hyphens, dots and underscores.

          Show
          janhoy Jan Høydahl added a comment - It makes sense to allow hyphens, but not as first character of the name. In SOLR-8110 we discuss rules for all kind of identifiers for 6.x and so far there seems to be consensus to allow both hyphens, dots and underscores.
          Hide
          arcadius Arcadius Ahouansou added a comment - - edited

          We just got hit by this issue.

          We are just replacing '-' by '_' as a workaround.

          Good we are using aliases to access the collections, so not much code change.

          Show
          arcadius Arcadius Ahouansou added a comment - - edited We just got hit by this issue. We are just replacing '-' by '_' as a workaround. Good we are using aliases to access the collections, so not much code change.
          Hide
          gerlowskija Jason Gerlowski added a comment -

          For context, this was initially changed in SOLR-8308. You can read up on the rationale/discussion there. To summarize, there were some issues with cores becoming inaccessible after being renamed, when the new name contained quirky characters. When enforcement was put in, the regex as written to match existing Solr "recommendations", which suggest that identifiers should only contain alphanumerics, periods, and underscores.

          There's been some pushback on treating hyphens as invalid, particularly on SOLR-8110, and on SOLR-8642. I think the current consensus is that this behavior will be changed in SOLR-8110.

          Show
          gerlowskija Jason Gerlowski added a comment - For context, this was initially changed in SOLR-8308 . You can read up on the rationale/discussion there. To summarize, there were some issues with cores becoming inaccessible after being renamed, when the new name contained quirky characters. When enforcement was put in, the regex as written to match existing Solr "recommendations", which suggest that identifiers should only contain alphanumerics, periods, and underscores. There's been some pushback on treating hyphens as invalid, particularly on SOLR-8110 , and on SOLR-8642 . I think the current consensus is that this behavior will be changed in SOLR-8110 .
          Hide
          jwartes Jeff Wartes added a comment -

          Well, I'm glad I haven't tried moving to 5.5 yet, this would have been an unpleasant migration discovery.
          I use hyphens in both collection names and alias names. (although not as a leading character)
          Generally, I prefer to avoid using underscores anyplace that ends up in a URL.

          Show
          jwartes Jeff Wartes added a comment - Well, I'm glad I haven't tried moving to 5.5 yet, this would have been an unpleasant migration discovery. I use hyphens in both collection names and alias names. (although not as a leading character) Generally, I prefer to avoid using underscores anyplace that ends up in a URL.
          Hide
          anshumg Anshum Gupta added a comment -

          After discussions on other JIRAs mentioned above, I'll add the hypen to the allowed char list and an added check to make sure that it's not the first char when naming a core, collection, shard etc.

          Show
          anshumg Anshum Gupta added a comment - After discussions on other JIRAs mentioned above, I'll add the hypen to the allowed char list and an added check to make sure that it's not the first char when naming a core, collection, shard etc.
          Hide
          anshumg Anshum Gupta added a comment -

          Patch without tests. Will add them in a bit and commit.
          This allows hyphens in the name but not as the first char.

          Show
          anshumg Anshum Gupta added a comment - Patch without tests. Will add them in a bit and commit. This allows hyphens in the name but not as the first char.
          Hide
          gerlowskija Jason Gerlowski added a comment -

          Hey Anshum Gupta, were you planning on changing the messages that get returned from the core/collection/shard APIs when an invalid identifier is used? I think they mention specifically what characters are allowed in a valid identifier. If the regex is changing, the messages should probably be updated also. (Might make sense to pull the message into a single constant that lives in SolrIdentifierValidator too, but maybe it doesn't, I'd have to look.)

          Happy to update the patch myself with those changes if you want. Wanted to check first though; didn't want to step on your toes if there's a reason you didn't do it.

          Show
          gerlowskija Jason Gerlowski added a comment - Hey Anshum Gupta , were you planning on changing the messages that get returned from the core/collection/shard APIs when an invalid identifier is used? I think they mention specifically what characters are allowed in a valid identifier. If the regex is changing, the messages should probably be updated also. (Might make sense to pull the message into a single constant that lives in SolrIdentifierValidator too, but maybe it doesn't, I'd have to look.) Happy to update the patch myself with those changes if you want. Wanted to check first though; didn't want to step on your toes if there's a reason you didn't do it.
          Hide
          mschumann Michael Schumann added a comment -

          This issue is stopping us from using 5.5 since we have a lot of infrastructure built around managing our collections and it would be expensive for us to adapt it to not use a hyphen. Please fix it with a point release.

          Show
          mschumann Michael Schumann added a comment - This issue is stopping us from using 5.5 since we have a lot of infrastructure built around managing our collections and it would be expensive for us to adapt it to not use a hyphen. Please fix it with a point release.
          Hide
          anshumg Anshum Gupta added a comment -

          Jason Gerlowski yes, it makes sense to consolidate all the messages in one place. I've done that in this patch which I'm about to commit. Thanks for looking at it.

          Show
          anshumg Anshum Gupta added a comment - Jason Gerlowski yes, it makes sense to consolidate all the messages in one place. I've done that in this patch which I'm about to commit. Thanks for looking at it.
          Hide
          anshumg Anshum Gupta added a comment -

          Yes, we should back-port this to a 5x release whenever that happens.

          Show
          anshumg Anshum Gupta added a comment - Yes, we should back-port this to a 5x release whenever that happens.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 6de2b7dbd17fc70fc9b2b053fe2628534116309b in lucene-solr's branch refs/heads/master from anshum
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6de2b7d ]

          SOLR-8725: Allow hyphen in shard, collection, core, and alias names but not the first char

          Show
          jira-bot ASF subversion and git services added a comment - Commit 6de2b7dbd17fc70fc9b2b053fe2628534116309b in lucene-solr's branch refs/heads/master from anshum [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6de2b7d ] SOLR-8725 : Allow hyphen in shard, collection, core, and alias names but not the first char
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 7daad8d7d17b429adbd6cf61474a81b7c7bdf9c9 in lucene-solr's branch refs/heads/master from anshum
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7daad8d ]

          SOLR-8725: Fix precommit check

          Show
          jira-bot ASF subversion and git services added a comment - Commit 7daad8d7d17b429adbd6cf61474a81b7c7bdf9c9 in lucene-solr's branch refs/heads/master from anshum [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7daad8d ] SOLR-8725 : Fix precommit check
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 7e59ba4220d836d205f454a35a9c8ae192c28a26 in lucene-solr's branch refs/heads/branch_6x from anshum
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7e59ba4 ]

          SOLR-8725: Allow hyphen in shard, collection, core, and alias names but not the first char

          Show
          jira-bot ASF subversion and git services added a comment - Commit 7e59ba4220d836d205f454a35a9c8ae192c28a26 in lucene-solr's branch refs/heads/branch_6x from anshum [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7e59ba4 ] SOLR-8725 : Allow hyphen in shard, collection, core, and alias names but not the first char
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 73d2d1125f43d6f482b0be7ecfccd673d1fe6d41 in lucene-solr's branch refs/heads/branch_6x from anshum
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=73d2d11 ]

          SOLR-8725: Fix precommit check

          Show
          jira-bot ASF subversion and git services added a comment - Commit 73d2d1125f43d6f482b0be7ecfccd673d1fe6d41 in lucene-solr's branch refs/heads/branch_6x from anshum [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=73d2d11 ] SOLR-8725 : Fix precommit check
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 3f15560b519f41eb579d114363a4874aa585b324 in lucene-solr's branch refs/heads/branch_6_0 from anshum
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3f15560b ]

          SOLR-8725: Allow hyphen in shard, collection, core, and alias names but not the first char

          Show
          jira-bot ASF subversion and git services added a comment - Commit 3f15560b519f41eb579d114363a4874aa585b324 in lucene-solr's branch refs/heads/branch_6_0 from anshum [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3f15560b ] SOLR-8725 : Allow hyphen in shard, collection, core, and alias names but not the first char
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 2fef533fe97be90ffb41daee83b4d05c88cc3a7a in lucene-solr's branch refs/heads/branch_6_0 from anshum
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2fef533 ]

          SOLR-8725: Fix precommit check

          Show
          jira-bot ASF subversion and git services added a comment - Commit 2fef533fe97be90ffb41daee83b4d05c88cc3a7a in lucene-solr's branch refs/heads/branch_6_0 from anshum [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2fef533 ] SOLR-8725 : Fix precommit check
          Hide
          elyograg Shawn Heisey added a comment -

          I could not cherry-pick this change to the 5.5 branch, or even apply it as a patch. The later branches have deviated too far.

          Instead I hand-edited a minimally invasive change to the pattern, the javadoc, and the message in the thrown exception. Patch for branch_5_5 attached.

          If I temporarily revert an ill-advised commit that I made to change the version to 5.5.1, precommit passes.

          Show
          elyograg Shawn Heisey added a comment - I could not cherry-pick this change to the 5.5 branch, or even apply it as a patch. The later branches have deviated too far. Instead I hand-edited a minimally invasive change to the pattern, the javadoc, and the message in the thrown exception. Patch for branch_5_5 attached. If I temporarily revert an ill-advised commit that I made to change the version to 5.5.1, precommit passes.
          Hide
          anshumg Anshum Gupta added a comment -

          Shawn pointed out to an issue with the regular expression to me on irc. Here's a fix with a quick test.

          Shawn Heisey, mind taking a look?

          Show
          anshumg Anshum Gupta added a comment - Shawn pointed out to an issue with the regular expression to me on irc. Here's a fix with a quick test. Shawn Heisey , mind taking a look?
          Hide
          elyograg Shawn Heisey added a comment -

          I think the regex did work before – I tried a whole bunch of creation commands to see if I could break it and it seemed to work as advertised. The new one is easier to understand, but I believe it does the same task.

          My main concern was getting the fix backported to the 5.5 branch, so if we do release 5.5.1, it will have the fix.

          Show
          elyograg Shawn Heisey added a comment - I think the regex did work before – I tried a whole bunch of creation commands to see if I could break it and it seemed to work as advertised. The new one is easier to understand, but I believe it does the same task. My main concern was getting the fix backported to the 5.5 branch, so if we do release 5.5.1, it will have the fix.
          Hide
          anshumg Anshum Gupta added a comment -

          I really thought that the regex didn't work but the test I added seems to say otherwise! I'll skip the patch.
          I'll take a look at the 5.5.1 fix. Thanks for the patch.

          Show
          anshumg Anshum Gupta added a comment - I really thought that the regex didn't work but the test I added seems to say otherwise! I'll skip the patch. I'll take a look at the 5.5.1 fix. Thanks for the patch.
          Hide
          erickerickson Erick Erickson added a comment -

          Anshum Gupta I made this a blocker in response to the move to cut 6.0 to be sure we consciously decide to include this or not...

          The revised regex is a little easier to understand I think, but whatever.... Maybe a comment about what the purpose of the regex is (i.e. exclude a hyphen as the first character)?

          And maybe add the fact that they cannot start with hypens to the getIdentifierMessage?

          Show
          erickerickson Erick Erickson added a comment - Anshum Gupta I made this a blocker in response to the move to cut 6.0 to be sure we consciously decide to include this or not... The revised regex is a little easier to understand I think, but whatever.... Maybe a comment about what the purpose of the regex is (i.e. exclude a hyphen as the first character)? And maybe add the fact that they cannot start with hypens to the getIdentifierMessage?
          Hide
          anshumg Anshum Gupta added a comment -

          but a working version of this is in 6.0, so I don't think we need this to be a blocker. I made this a blocker thinking on similar lines but then decided against it.

          It would be nice to add that information about hyphen not being allowed as the first character to the getIdentifiesMessage method though.

          Show
          anshumg Anshum Gupta added a comment - but a working version of this is in 6.0, so I don't think we need this to be a blocker. I made this a blocker thinking on similar lines but then decided against it. It would be nice to add that information about hyphen not being allowed as the first character to the getIdentifiesMessage method though.
          Hide
          erickerickson Erick Erickson added a comment -

          OK, Anshum is correct, this fix is in 6.0 so this shouldn't be a blocker, just have to decide what to do about 5.5.1 etc.

          And my comment about re-wording things was on the theory that the patch hadn't been committed yet, feel free to ignore.

          Show
          erickerickson Erick Erickson added a comment - OK, Anshum is correct, this fix is in 6.0 so this shouldn't be a blocker, just have to decide what to do about 5.5.1 etc. And my comment about re-wording things was on the theory that the patch hadn't been committed yet, feel free to ignore.
          Hide
          anshumg Anshum Gupta added a comment -

          Shawn Heisey the patch looks fine to me. I really wanted to get things in sync but seem like the code changed while enforcing more restrictions.
          Let's go with this as it's the same regular expression.

          Show
          anshumg Anshum Gupta added a comment - Shawn Heisey the patch looks fine to me. I really wanted to get things in sync but seem like the code changed while enforcing more restrictions. Let's go with this as it's the same regular expression.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 9bca6db03223e1da07129e374100b74cc14f9a44 in lucene-solr's branch refs/heads/branch_5_5 from Anshum Gupta
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9bca6db ]

          SOLR-8725: Allow hyphen in shard, collection, core, and alias names but not the first char (backport to 5.5 branch)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 9bca6db03223e1da07129e374100b74cc14f9a44 in lucene-solr's branch refs/heads/branch_5_5 from Anshum Gupta [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9bca6db ] SOLR-8725 : Allow hyphen in shard, collection, core, and alias names but not the first char (backport to 5.5 branch)
          Hide
          JoyceBabu Joyce Babu added a comment -

          Is this now available in the latest nightly?

          Show
          JoyceBabu Joyce Babu added a comment - Is this now available in the latest nightly?
          Hide
          anshumg Anshum Gupta added a comment -

          Yes, this would be available.

          Show
          anshumg Anshum Gupta added a comment - Yes, this would be available.
          Hide
          anshumg Anshum Gupta added a comment -

          Reopening this as it's not on the 5x branch but on 5.5.

          Show
          anshumg Anshum Gupta added a comment - Reopening this as it's not on the 5x branch but on 5.5.
          Hide
          stephlag Stephan Lagraulet added a comment -

          Can you close the issue as it is commited on the 5.5 branch?

          Show
          stephlag Stephan Lagraulet added a comment - Can you close the issue as it is commited on the 5.5 branch?
          Hide
          hossman Hoss Man added a comment -

          Manually correcting fixVersion per Step #S6 of LUCENE-7271

          Show
          hossman Hoss Man added a comment - Manually correcting fixVersion per Step #S6 of LUCENE-7271

            People

            • Assignee:
              anshumg Anshum Gupta
              Reporter:
              cbeer Chris Beer
            • Votes:
              7 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development