Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-10108

bin/solr script recursive copy broken

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.6, 7.0
    • Component/s: None
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      cp /r zk:/ fails with "cannot create //whatever".

      1. SOLR-10108.patch
        22 kB
        Erick Erickson
      2. SOLR-10108.patch
        22 kB
        Erick Erickson
      3. SOLR-10108.patch
        24 kB
        Erick Erickson

        Issue Links

          Activity

          Hide
          janhoy Jan Høydahl added a comment -

          What version? Seems to work here on master

          $ bin/solr start -c
          $ bin/solr zk -z localhost:9983 cp -r bin zk:/
          
          Connecting to ZooKeeper at localhost:9983 ...
          Copying from 'bin' to 'zk:/'. ZooKeeper at localhost:9983
          $ bin/solr zk -z localhost:9983 ls -r /bin
          
          Connecting to ZooKeeper at localhost:9983 ...
          Getting listing for Zookeeper node /bin from ZooKeeper at localhost:9983 recurse: true
          /bin
              post
              solr.in.cmd
              solr
              solr-8983.pid
              init.d
                     solr
              solr.in.sh
              install_solr_service.sh
              solr.cmd
              oom_solr.sh
          
          Show
          janhoy Jan Høydahl added a comment - What version? Seems to work here on master $ bin/solr start -c $ bin/solr zk -z localhost:9983 cp -r bin zk:/ Connecting to ZooKeeper at localhost:9983 ... Copying from 'bin' to 'zk:/'. ZooKeeper at localhost:9983 $ bin/solr zk -z localhost:9983 ls -r /bin Connecting to ZooKeeper at localhost:9983 ... Getting listing for Zookeeper node /bin from ZooKeeper at localhost:9983 recurse: true /bin post solr.in.cmd solr solr-8983.pid init.d solr solr.in.sh install_solr_service.sh solr.cmd oom_solr.sh
          Hide
          erickerickson Erick Erickson added a comment -

          Jan:

          Thanks for looking. I'm trying to go the other way (which I should have specified, just tried on 6x since I have it handy):

          bin/solr zk cp -r zk:/ ~/eoezk -z localhost:2181

          Fails with: ERROR: Invalid path string "//configs" caused by empty node name specified 1

          where
          bin/solr zk cp -r zk:/whatever ~/eoezk -z localhost:2181
          works.

          The context here was a client who'd accidentally removed the zoo_data directory on all ZKs but still had them running. We hit on the bright idea "hey, we can just dump all of the ZK data since the data is still available until we restart the ZK nodes" but ran into this when trying to copy stuff down.

          So I worked out the issues with the path and got the two-way copy to work, but also noticed another issue. Since ZK nodes can have data whether leaf nodes or not, the current process is lossy since non-leaf nodes don't get their data restored.

          This makes it impossible to backup the collection node and restore it since the collection can have a configset name as data. My take is that copying back and forth should restore intermediate node's data, do you (and others) concur?

          My first-attempt PoC is to create a very special file name something like node_zookeeper_solr.data to put any information associated with non-leaf nodes when the data is not empty as a PoC. That feels like a hack though as there's the possibility of collisions. Hmmm, maybe

          {generated_guid}

          .znode.solr.data? Still possibly a collision if someone, somehow managed to have a znode with a GUID followed by .znode.solr.data I suppose, but that's seems unlikely enough that I'm not willing to worry about it. How about "erick.erickson.was.here.data"? Maybe not.

          WDYT

          Show
          erickerickson Erick Erickson added a comment - Jan: Thanks for looking. I'm trying to go the other way (which I should have specified, just tried on 6x since I have it handy): bin/solr zk cp -r zk:/ ~/eoezk -z localhost:2181 Fails with: ERROR: Invalid path string "//configs" caused by empty node name specified 1 where bin/solr zk cp -r zk:/whatever ~/eoezk -z localhost:2181 works. The context here was a client who'd accidentally removed the zoo_data directory on all ZKs but still had them running. We hit on the bright idea "hey, we can just dump all of the ZK data since the data is still available until we restart the ZK nodes" but ran into this when trying to copy stuff down. So I worked out the issues with the path and got the two-way copy to work, but also noticed another issue. Since ZK nodes can have data whether leaf nodes or not, the current process is lossy since non-leaf nodes don't get their data restored. This makes it impossible to backup the collection node and restore it since the collection can have a configset name as data. My take is that copying back and forth should restore intermediate node's data, do you (and others) concur? My first-attempt PoC is to create a very special file name something like node_zookeeper_solr.data to put any information associated with non-leaf nodes when the data is not empty as a PoC. That feels like a hack though as there's the possibility of collisions. Hmmm, maybe {generated_guid} .znode.solr.data? Still possibly a collision if someone, somehow managed to have a znode with a GUID followed by .znode.solr.data I suppose, but that's seems unlikely enough that I'm not willing to worry about it. How about "erick.erickson.was.here.data"? Maybe not. WDYT
          Hide
          erickerickson Erick Erickson added a comment -

          Well, it works for me. If someone with a Windows setup could try the following I'd be grateful:

          Set up a SolrCloud instance, just enough to get it running and see the cloud graph.

          > 'bin/solr zk cp -r zk:/ file:/some/dir -z localhost:2181'
          > Shut down Solr
          > Re-initialize ZK (wipe all the data)
          > 'bin/solr zk cp -r file:/some/dir/ zk:/ -z localhost:2181'
          > start Solr again

          If I'm lucky, that'll start Solr up seamlessly. The idea is that the above should save the entire ZK state to a local drive and then restore it. We had a case where the Solr nodes were still up but all the ZK data had been lost and this would have been simpler than what we had to do.

          This is something of a side-effect of making the recursive copy just work to/from the root of ZK. There are problems with recursive copy in the first place when it's not from some node below the root without this patch.

          This patch also handles the fact that non-leaf nodes in Solr can have data and properly saves/restores that data.

          Show
          erickerickson Erick Erickson added a comment - Well, it works for me. If someone with a Windows setup could try the following I'd be grateful: Set up a SolrCloud instance, just enough to get it running and see the cloud graph. > 'bin/solr zk cp -r zk:/ file:/some/dir -z localhost:2181' > Shut down Solr > Re-initialize ZK (wipe all the data) > 'bin/solr zk cp -r file:/some/dir/ zk:/ -z localhost:2181' > start Solr again If I'm lucky, that'll start Solr up seamlessly. The idea is that the above should save the entire ZK state to a local drive and then restore it. We had a case where the Solr nodes were still up but all the ZK data had been lost and this would have been simpler than what we had to do. This is something of a side-effect of making the recursive copy just work to/from the root of ZK. There are problems with recursive copy in the first place when it's not from some node below the root without this patch. This patch also handles the fact that non-leaf nodes in Solr can have data and properly saves/restores that data.
          Hide
          erickerickson Erick Erickson added a comment -

          Still hoping someone will give this a whirl on Windows before committing... I'd like to get this in Solr 6.5 if/when.

          Otherwise I'll commit it this week sometime.

          Show
          erickerickson Erick Erickson added a comment - Still hoping someone will give this a whirl on Windows before committing... I'd like to get this in Solr 6.5 if/when. Otherwise I'll commit it this week sometime.
          Hide
          janhoy Jan Høydahl added a comment -

          So I took it for a spin. I started Solr master with patch applied i cloud mode, then created a collection.
          Then I copied all content of internal zk:/ (9983) to my disk /tmp/zoo. That worked.
          Then I stopped Solr, wiped zoo_data
          I then started a local ZK and copied everything onto zk:/solr (another root)
          Then started Solr with bin/solr start -z localhost:2181/solr/zoo. BINGO

          So your patch seems to work. But I could not figure out how to copy all files from my /tmp/zoo folder into zk root. Specifying solr zk cp /tmp/zoo/* zk:/ did not work:

          Could not complete the zk operation for reason: Path /tmp/zoo/* does not exist

          Show
          janhoy Jan Høydahl added a comment - So I took it for a spin. I started Solr master with patch applied i cloud mode, then created a collection. Then I copied all content of internal zk:/ (9983) to my disk /tmp/zoo. That worked. Then I stopped Solr, wiped zoo_data I then started a local ZK and copied everything onto zk:/solr (another root) Then started Solr with bin/solr start -z localhost:2181/solr/zoo. BINGO So your patch seems to work. But I could not figure out how to copy all files from my /tmp/zoo folder into zk root. Specifying solr zk cp /tmp/zoo/* zk:/ did not work: Could not complete the zk operation for reason: Path /tmp/zoo/* does not exist
          Hide
          erickerickson Erick Erickson added a comment -

          Jan:

          Thanks! try

          solr zk cp /tmp/zoo zk:/
          or
          solr zk cp /tmp/zoo/ zk:/
          

          I put a couple of examples in the help section, did I mess those up?

           to copy to local: 'bin/solr zk cp -r zk:/ /some/dir -z localhost:2181'
          to restore to ZK: 'bin/solr zk cp -r /some/dir/ zk:/ -z localhost:2181
          

          I struggled a bit with how much I wanted to recreate the Unix style cp commands, one of those cases where the differences will hurt you. The problem I remember having was that the shell gets in there and expands the '/tmp/*' form and gives me a list of the children of tmp that makes parsing the parameters "interesting".

          This is *nix though, right?

          Still hoping for a Windows user to give it a spin too.

          Show
          erickerickson Erick Erickson added a comment - Jan: Thanks! try solr zk cp /tmp/zoo zk:/ or solr zk cp /tmp/zoo/ zk:/ I put a couple of examples in the help section, did I mess those up? to copy to local: 'bin/solr zk cp -r zk:/ /some/dir -z localhost:2181' to restore to ZK: 'bin/solr zk cp -r /some/dir/ zk:/ -z localhost:2181 I struggled a bit with how much I wanted to recreate the Unix style cp commands, one of those cases where the differences will hurt you. The problem I remember having was that the shell gets in there and expands the '/tmp/*' form and gives me a list of the children of tmp that makes parsing the parameters "interesting". This is *nix though, right? Still hoping for a Windows user to give it a spin too.
          Hide
          janhoy Jan Høydahl added a comment - - edited

          Thanks, putting the / at the end of the src path (no asterisk) did the trick.

          The problem I remember having was that the shell gets in there and expands the '/tmp/*' form and gives me a list of the children of tmp that makes parsing the parameters "interesting".

          To avoid the shell parsing the *, you can quote just that part, such as "/tmp/zoo/*" and it will be passed as-is to the script. But that's when you get the error

          ERROR: Path /tmp/zoo/* does not exist
          

          So we have an option to look for a local path ending with /*, and then simply remote the trailing asterisk - which will mimic the intention of the user. We could of course also try to detect whether there are more than two arguments after cp, i.e. something else than src and dst, and in such case print a help message that instead of calling out the 3rd unexpected argument as the problem, instead print a help like

          ERROR: Too many arguments for command "cp". Note that the src and dst arguments do not support wildcard characters.
          
          Show
          janhoy Jan Høydahl added a comment - - edited Thanks, putting the / at the end of the src path (no asterisk) did the trick. The problem I remember having was that the shell gets in there and expands the '/tmp/*' form and gives me a list of the children of tmp that makes parsing the parameters "interesting". To avoid the shell parsing the * , you can quote just that part, such as "/tmp/zoo/*" and it will be passed as-is to the script. But that's when you get the error ERROR: Path /tmp/zoo/* does not exist So we have an option to look for a local path ending with /* , and then simply remote the trailing asterisk - which will mimic the intention of the user. We could of course also try to detect whether there are more than two arguments after cp , i.e. something else than src and dst, and in such case print a help message that instead of calling out the 3rd unexpected argument as the problem, instead print a help like ERROR: Too many arguments for command "cp". Note that the src and dst arguments do not support wildcard characters.
          Hide
          erickerickson Erick Erickson added a comment -

          Here's a quick patch that allows the local source to have a trailing wildcard. I also reworked the help screen and the error message. The param checking barfs on the first param it doesn't recognize, there's no specific handling for "cp" so I put in a plea to look at the help screen when there a non-quoted trailing wildcard.

          I also reworked the help for 'cp' a bit, I hope it's a little clearer. I really didn't like the trailing slash stuff. It's still there but not the only way now.

          I also noticed that the windows command file had trailing quotes and a bunch of other stuff so it'd be really helpful if someone who 1> knows windows commands and 2> has a windows box could give it a try, especially the trailing slash and trailing asterisk bits. I don't have any clue whether Windows will require quoting the trailing wildcard case.

          Thanks Jan! Your suggestion was a good one.

          Show
          erickerickson Erick Erickson added a comment - Here's a quick patch that allows the local source to have a trailing wildcard. I also reworked the help screen and the error message. The param checking barfs on the first param it doesn't recognize, there's no specific handling for "cp" so I put in a plea to look at the help screen when there a non-quoted trailing wildcard. I also reworked the help for 'cp' a bit, I hope it's a little clearer. I really didn't like the trailing slash stuff. It's still there but not the only way now. I also noticed that the windows command file had trailing quotes and a bunch of other stuff so it'd be really helpful if someone who 1> knows windows commands and 2> has a windows box could give it a try, especially the trailing slash and trailing asterisk bits. I don't have any clue whether Windows will require quoting the trailing wildcard case. Thanks Jan! Your suggestion was a good one.
          Hide
          janhoy Jan Høydahl added a comment -

          I put in a plea to look at the help screen when there a non-quoted trailing wildcard.

          How can you check this? I thought the shell would expand that before handling it over to our script. And I'm not sure it behaves the same way for Windows? But I cannot test Windows right now.

          Show
          janhoy Jan Høydahl added a comment - I put in a plea to look at the help screen when there a non-quoted trailing wildcard. How can you check this? I thought the shell would expand that before handling it over to our script. And I'm not sure it behaves the same way for Windows? But I cannot test Windows right now.
          Hide
          erickerickson Erick Erickson added a comment -

          bq: How can you check this?

          Basically I don't. The query parsing in the script just fails when it encounters some of the file names since it can't recognize the parameter so I added the bit about wildcards to the generic failure message.

          I've got a line on some other folks who can test Windows, thanks for your efforts here!

          Erick

          Show
          erickerickson Erick Erickson added a comment - bq: How can you check this? Basically I don't. The query parsing in the script just fails when it encounters some of the file names since it can't recognize the parameter so I added the bit about wildcards to the generic failure message. I've got a line on some other folks who can test Windows, thanks for your efforts here! Erick
          Hide
          erickerickson Erick Erickson added a comment -

          Final patch sans Windows testing. We can open new JIRAs if there are Windows issues.

          Show
          erickerickson Erick Erickson added a comment - Final patch sans Windows testing. We can open new JIRAs if there are Windows issues.
          Hide
          erickerickson Erick Erickson added a comment -

          Rats, typo in commit message.

          Trunk commit: 0b3ca1bb61cb6768ef99a4ee7f4ac05a71d19f56
          6x commit: c22c8bdebb9ab2a5cdb9951ccdec98a0f46f705e

          Show
          erickerickson Erick Erickson added a comment - Rats, typo in commit message. Trunk commit: 0b3ca1bb61cb6768ef99a4ee7f4ac05a71d19f56 6x commit: c22c8bdebb9ab2a5cdb9951ccdec98a0f46f705e

            People

            • Assignee:
              erickerickson Erick Erickson
              Reporter:
              erickerickson Erick Erickson
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development