Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-16670

Couldn't restore a backed up config set from S3

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

       
      Solr 9.1.x doesn't currently allow me to make a full restore of a backup where the data and config set are stored on a S3 bucket. The error I have received each run is "The specified key does not exist". Additionally, the full message is:
       

      An AmazonServiceException was thrown! [serviceName=S3] [awsRequestId=2C6] [httpStatus=404] [s3ErrorCode=NoSuchKey] [message=The specified key does not exist.] 

       
      After investigating the problem further, I have found that the path used to control whether it's a directory or not in the isDirectory method makes the `S3Client.headObject` method panic. On line 324, the path pointing to a file is transformed into a path leading to a slash. When a path, for example, is "path1/path2/backup-name/collection-name/zk_backup_0/configs/config-set-v1/configoverlay.json", `sanitizedDirPath` adds a slash "/" character to the end of the path as "path1/path2/backup-name/collection-name/zk_backup_0/configs/config-set-v1/configoverlay.json/". Although I'm able to restore the backup if the cluster already has the config schema definition in the zk, I cannot restore the backed up config schema files while creating an empty cluster due to this error.
        
      For the sake of this question, here I am describing the other parts;
       
      Backup definition:

        <backup>
          <repository name="s3-repo" class="org.apache.solr.s3.S3BackupRepository" default="false">
            <str name="s3.bucket.name">com.dev.bucket.backup.folder</str>
            <str name="s3.region">us-east-2</str>
          </repository>
        </backup> 

       
      The backup folder structure on S3:
       

      .
      └── bucket-name
          └── path1
              └── path2
                  └── backup-name
                      └── collection-name
                          ├── backup_0.properties
                          ├── index ...
                          ├── shard_backup_metadata
                          │   └── md_shard1_0.json
                          └── zk_backup_0
                              ├── collection_state.json
                              └── configs
                                  └── config-set-v1
                                      ├── configoverlay.json
                                      ├── solrconfig.xml
                                      ├── stopwords.txt
                                      └── synonyms.txt 

       

       
      The cURL request I use for restore:

      curl -i -X POST \
         -H "Content-Type:application/json" \
         -d \
      '{
        "restore-collection": {
          "name": "backup-name",
          "collection": "collection-name-restored",
          "location": "path1/path2/"
          "repository": "s3-pro",
        }
      }' \
       'http://localhost:8983/api/c' 

      This is the original question providing the same description.
       

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            houston Houston Putman
            ozlerhakan Hakan Özler
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 20m
              20m

              Slack

                Issue deployment