Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-17460

Error During Collection Migration from Solr 7.0 to Solr 8.4: Missing Files and Shard Restoration Failures

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Workaround
    • 7.0, 8.4
    • None
    • hdfs, SolrCloud

    Description

      I was attempting to migrate a collection with 3 shards from a Solr 7.0 cluster to a Solr 8.4 cluster. The data is stored in HDFS. I followed the backup-restore process but encountered issues with two of the shards during the restoration.

      Migration Process:

      1- Backup Command: To avoid timeouts, I initiated the backup with an async parameter:

      curl -k --negotiate -u : 'https://<solrNode>:<solrPort>/solr/admin/collections?action=BACKUP&name=<backupName>&collection=<solrCollectionName>x&location=<hdfsPath>& async=12346'

      2- Copy Backup to Local: After the backup, I copied the data from HDFS to the local filesystem:

      hdfs dfs --copyToLocal <backupPath> <localPath>

      3- Transfer Backup to New Cluster: I then copied the backup files from the older Solr node to the newer one:

      scp -pr <localPath> <username>@<ip>:<localPathDestination>

      4- Prepare New HDFS Path: On the new Solr cluster, I created a new directory in HDFS and adjusted ownership:

      hdfs dfs -mkdir <pathName2>
      hdfs dfs -chown solr:solr <pathName2>

      5- Copy Backup to New HDFS Location: I transferred the backup data from local to the new HDFS path. Before that, I deleted "<str>queryDocAuthorization</str>" parts from solrconfig.xml file to become compatible with the newer version.

      hdfs dfs --copyFromLocal <localPathDestination> <pathName2>

      6- Restore Collection: Finally, I ran the restore command:

      curl -k --negotiate -u : 'https://<solrNode>:<solrPort>/solr/admin/collections?action=RESTORE&name=<backupName>&collection=<solrCollectionName>x&location=<hdfs_path>& async=12345'

       

      Issue:

      After the restore process completed, I found that two of the shards could not be restored. The logs displayed the following errors:

      Error During Shard Restoration:

      ERROR [c: <solrCollectionName> s: shard2 r:core_node5 x: : <solrCollectionName>_shard2_replica_n4] o.a.s.h.RequestHandlerBase org.apache.solr.common. SolrException: Error CREATEing SolrCore '<solrCollectionName>_shard2_replica_n4': Unable to create core [:<solrCollectionName>_shard2_replica_n4] Caused by: org.apache.solr.handler.component.QueryDocAuthorizationComponent.....

      FileNotFoundException and Index Corruption:

      WARN (parallelCoreAdminExecutor-6-thread-7-processing-n:<solrNode>:<solrPort>_solrx:<solrCollectionName>_shard2
      _replica_n1 <numbers> RESTORECORE) [x:<solrCollectionName>_shard2_replica_n1] o.a.s.h. RestoreCore Could not switch to restored index. Rolling back to the current index => org.apache.lucene.index.CorruptindexException: Unexpected file read error while reading index. (resource=BufferedChecksumIndexInput(segments_1g9dk))
      Caused by: java.io. FileNotFoundException: File does not exist: hdfs://<hdfsPath>/core_node2/data/restore/<fileName>

      It appears that Solr is looking for a file in HDFS that doesn't exist, despite no manual deletions being made. I cannot determine why these specific shards failed to restore, or why the system is unable to locate the required files.

      Expected Behavior:
      The backup and restore process should complete without errors, and all shards should be restored successfully to the new cluster.

      Actual Behavior:
      Two shards failed to restore, with errors related to missing files and index corruption.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ardate Arda
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: