Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-10249

Allow index fetching to return a detailed result instead of a true/false value

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: 6.4.1
    • Fix Version/s: 6.6
    • Component/s: replication (java)
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
    • Environment:

      Any

    • Flags:
      Patch

      Description

      This gives us the ability to see into why a replication might of failed and act on it if we need to. We use this enhancement for logging conditions so we can quantify what is happening with replication, get success rates, etc.

      The idea is to create a public static class IndexFetchResult as an inner class to IndexFetcher that has strings that hold statuses that could occur while fetching an index.

      1. SOLR_10249.patch
        12 kB
        David Smiley

        Issue Links

          Activity

          Hide
          millerjeff0 Jeff Miller added a comment -

          Diffs for my local testing

          diff --git a/solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java b/solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java
          index 90e515a..2483e69 100644
          — a/solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java
          +++ b/solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java
          @@ -153,7 +153,7 @@ public class RecoveryStrategy extends Thread implements Closeable {
          solrParams.set(ReplicationHandler.MASTER_URL, leaderUrl);

          if (isClosed()) return; // we check closed on return

          • boolean success = replicationHandler.doFetch(solrParams, false);
            + boolean success = replicationHandler.doFetch(solrParams, false).getStatus();

          if (!success) {
          throw new SolrException(ErrorCode.SERVER_ERROR, "Replication for recovery failed.");
          diff --git a/solr/core/src/java/org/apache/solr/handler/CdcrRequestHandler.java b/solr/core/src/java/org/apache/solr/handler/CdcrRequestHandler.java
          index f706637..a65299a 100644
          — a/solr/core/src/java/org/apache/solr/handler/CdcrRequestHandler.java
          +++ b/solr/core/src/java/org/apache/solr/handler/CdcrRequestHandler.java
          @@ -754,7 +754,7 @@ public class CdcrRequestHandler extends RequestHandlerBase implements SolrCoreAw
          // we do not want the raw tlog files from the source
          solrParams.set(ReplicationHandler.TLOG_FILES, false);

          • success = replicationHandler.doFetch(solrParams, false);
            + success = replicationHandler.doFetch(solrParams, false).getStatus();

          // this is required because this callable can race with HttpSolrCall#destroy
          // which clears the request info.
          diff --git a/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java b/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java
          index b9d9f51..281e660 100644
          — a/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java
          +++ b/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java
          @@ -106,6 +106,8 @@ import static org.apache.solr.common.params.CommonParams.JAVABIN;
          import static org.apache.solr.common.params.CommonParams.NAME;
          import static org.apache.solr.handler.ReplicationHandler.*;

          +import com.google.common.base.Strings;
          +
          /**

          • <p> Provides functionality of downloading changed index files as well as config files and a timer for scheduling fetches from the
          • master. </p>
            @@ -161,6 +163,52 @@ public class IndexFetcher {

          private Integer soTimeout;

          + private static final String INTERRUPT_RESPONSE_MESSAGE = "Interrupted while waiting for modify lock";
          +
          + public static class IndexFetchResult {
          + private final String message;
          + private final boolean status;
          + private final Throwable exception;
          +
          + public static final String FAILED_BY_INTERRUPT_MESSAGE = "Fetching index failed by interrupt";
          + public static final String FAILED_BY_EXCEPTION_MESSAGE = "Fetching index failed by exception";
          +
          + /** pre-defined results */
          + public static final IndexFetchResult ALREADY_IN_SYNC = new IndexFetchResult("Local index commit is already in sync with peer", true, null);
          + public static final IndexFetchResult INDEX_FETCH_FAILURE = new IndexFetchResult("Fetching lastest index is failed", false, null);
          + public static final IndexFetchResult INDEX_FETCH_SUCCESS = new IndexFetchResult("Fetching latest index is successful", true, null);
          + public static final IndexFetchResult LOCK_OBTAIN_FAILED = new IndexFetchResult("Obtaining SnapPuller lock failed", false, null);
          + public static final IndexFetchResult MASTER_VERSION_ZERO = new IndexFetchResult("Index in peer is empty and never committed yet", true, null);
          + public static final IndexFetchResult NO_INDEX_COMMIT_EXIST = new IndexFetchResult("No IndexCommit in local index", false, null);
          + public static final IndexFetchResult PEER_INDEX_COMMIT_DELETED = new IndexFetchResult("No files to download because IndexCommit in peer was deleted", false, null);
          + // SFDC: adding a new failure result when replication is aborted because of local activity
          + public static final IndexFetchResult LOCAL_ACTIVITY_DURING_REPLICATION = new IndexFetchResult("Local index modification during replication", false, null);
          +
          + IndexFetchResult(String message, boolean status, Throwable exception)

          { + this.message = message; + this.status = status; + this.exception = exception; + }

          +
          + /*
          + * @return exception thrown if failed by exception or interrupt, otherwise null
          + */
          + public Throwable getException()

          { + return this.exception; + }

          +
          + /*
          + * @return true if index fetch was successful, false otherwise
          + */
          + public boolean getStatus()

          { + return this.status; + }

          +
          + public String getMessage()

          { + return this.message; + }

          + }
          +
          private static HttpClient createHttpClient(SolrCore core, String httpBasicAuthUser, String httpBasicAuthPassword, boolean useCompression) {
          final ModifiableSolrParams httpClientParams = new ModifiableSolrParams();
          httpClientParams.set(HttpClientUtil.PROP_BASIC_AUTH_USER, httpBasicAuthUser);
          @@ -264,7 +312,7 @@ public class IndexFetcher {
          }
          }

          • boolean fetchLatestIndex(boolean forceReplication) throws IOException, InterruptedException {
            + IndexFetchResult fetchLatestIndex(boolean forceReplication) throws IOException, InterruptedException { return fetchLatestIndex(forceReplication, false); }

          @@ -277,7 +325,7 @@ public class IndexFetcher {

          • @return true on success, false if slave is already in sync
          • @throws IOException if an exception occurs
            */
          • boolean fetchLatestIndex(boolean forceReplication, boolean forceCoreReload) throws IOException, InterruptedException {
            + IndexFetchResult fetchLatestIndex(boolean forceReplication, boolean forceCoreReload) throws IOException, InterruptedException {

          boolean cleanupDone = false;
          boolean successfulInstall = false;
          @@ -302,9 +350,16 @@ public class IndexFetcher {
          try

          { response = getLatestVersion(); }

          catch (Exception e)

          { - LOG.error("Master at: " + masterUrl + " is not available. Index fetch failed. Exception: " + e.getMessage()); - return false; - }

          + final String errorMsg = e.getMessage();
          + if (!Strings.isNullOrEmpty(errorMsg) && errorMsg.contains(INTERRUPT_RESPONSE_MESSAGE))

          { + LOG.warn("Master at: " + masterUrl + " is not available. Index fetch failed by interrupt. Exception: " + errorMsg); + return new IndexFetchResult(IndexFetchResult.FAILED_BY_INTERRUPT_MESSAGE, false, e); + }

          else

          { + LOG.warn("Master at: " + masterUrl + " is not available. Index fetch failed by exception: " + errorMsg); + return new IndexFetchResult(IndexFetchResult.FAILED_BY_EXCEPTION_MESSAGE, false, e); + }

          + }
          +
          long latestVersion = (Long) response.get(CMD_INDEX_VERSION);
          long latestGeneration = (Long) response.get(GENERATION);

          @@ -320,7 +375,7 @@ public class IndexFetcher {
          searcherRefCounted = solrCore.getNewestSearcher(false);
          if (searcherRefCounted == null)

          { LOG.warn("No open searcher found - fetch aborted"); - return false; + return IndexFetchResult.NO_INDEX_COMMIT_EXIST; }

          commit = searcherRefCounted.get().getIndexReader().getIndexCommit();
          } finally {
          @@ -347,7 +402,7 @@ public class IndexFetcher

          { //there is nothing to be replicated successfulInstall = true; - return true; + return IndexFetchResult.MASTER_VERSION_ZERO; }

          // TODO: Should we be comparing timestamps (across machines) here?
          @@ -355,14 +410,14 @@ public class IndexFetcher

          { //master and slave are already in sync just return LOG.info("Slave in sync with master."); successfulInstall = true; - return true; + return IndexFetchResult.ALREADY_IN_SYNC; }

          LOG.info("Starting replication process");
          // get the list of files first
          fetchFileList(latestGeneration);
          // this can happen if the commit point is deleted before we fetch the file list.
          if (filesToDownload.isEmpty())

          { - return false; + return IndexFetchResult.PEER_INDEX_COMMIT_DELETED; }

          LOG.info("Number of files in latest index in master: " + filesToDownload.size());
          if (tlogFilesToDownload != null) {
          @@ -550,14 +605,14 @@ public class IndexFetcher {
          LOG.warn(
          "Replication attempt was not successful - trying a full index replication reloadCore={}",
          reloadCore);

          • successfulInstall = fetchLatestIndex(true, reloadCore);
            + successfulInstall = fetchLatestIndex(true, reloadCore).getStatus();
            }

          markReplicationStop();

          • return successfulInstall;
            + return successfulInstall ? IndexFetchResult.INDEX_FETCH_SUCCESS : IndexFetchResult.INDEX_FETCH_FAILURE;
            } catch (ReplicationHandlerException e) { LOG.error("User aborted Replication"); - return false; + return new IndexFetchResult(IndexFetchResult.FAILED_BY_EXCEPTION_MESSAGE, false, e); }

            catch (SolrException e)

            { throw e; }

            catch (InterruptedException e) {
            diff --git a/solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java b/solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java
            index 84e1ba2..becec83 100644

              • a/solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java
                +++ b/solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java
                @@ -88,6 +88,7 @@ import org.apache.solr.core.SolrEventListener;
                import org.apache.solr.core.backup.repository.BackupRepository;
                import org.apache.solr.core.backup.repository.LocalFileSystemRepository;
                import org.apache.solr.core.snapshots.SolrSnapshotMetaDataManager;
                +import org.apache.solr.handler.IndexFetcher.IndexFetchResult;
                import org.apache.solr.request.SolrQueryRequest;
                import org.apache.solr.response.SolrQueryResponse;
                import org.apache.solr.search.SolrIndexSearcher;
                @@ -381,10 +382,10 @@ public class ReplicationHandler extends RequestHandlerBase implements SolrCoreAw

          private volatile IndexFetcher currentIndexFetcher;

          • public boolean doFetch(SolrParams solrParams, boolean forceReplication) {
            + public IndexFetchResult doFetch(SolrParams solrParams, boolean forceReplication) {
            String masterUrl = solrParams == null ? null : solrParams.get(MASTER_URL);
            if (!indexFetchLock.tryLock())
          • return false;
            + return IndexFetchResult.LOCK_OBTAIN_FAILED;
            try {
            if (masterUrl != null) {
            if (currentIndexFetcher != null && currentIndexFetcher != pollingIndexFetcher)
            Unknown macro: {@@ -400,17 +401,16 @@ public class ReplicationHandler extends RequestHandlerBase implements SolrCoreAw if (currentIndexFetcher != pollingIndexFetcher) { currentIndexFetcher.destroy(); }+ return new IndexFetchResult(IndexFetchResult.FAILED_BY_EXCEPTION_MESSAGE, false, e); }

            finally {
            if (pollingIndexFetcher != null) {
            if( currentIndexFetcher != pollingIndexFetcher)

            { currentIndexFetcher.destroy(); }
          • currentIndexFetcher = pollingIndexFetcher;
            }
            indexFetchLock.unlock();
            }

          • return false;
            }

          boolean isReplicating() {

          Show
          millerjeff0 Jeff Miller added a comment - Diffs for my local testing diff --git a/solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java b/solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java index 90e515a..2483e69 100644 — a/solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java +++ b/solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java @@ -153,7 +153,7 @@ public class RecoveryStrategy extends Thread implements Closeable { solrParams.set(ReplicationHandler.MASTER_URL, leaderUrl); if (isClosed()) return; // we check closed on return boolean success = replicationHandler.doFetch(solrParams, false); + boolean success = replicationHandler.doFetch(solrParams, false).getStatus(); if (!success) { throw new SolrException(ErrorCode.SERVER_ERROR, "Replication for recovery failed."); diff --git a/solr/core/src/java/org/apache/solr/handler/CdcrRequestHandler.java b/solr/core/src/java/org/apache/solr/handler/CdcrRequestHandler.java index f706637..a65299a 100644 — a/solr/core/src/java/org/apache/solr/handler/CdcrRequestHandler.java +++ b/solr/core/src/java/org/apache/solr/handler/CdcrRequestHandler.java @@ -754,7 +754,7 @@ public class CdcrRequestHandler extends RequestHandlerBase implements SolrCoreAw // we do not want the raw tlog files from the source solrParams.set(ReplicationHandler.TLOG_FILES, false); success = replicationHandler.doFetch(solrParams, false); + success = replicationHandler.doFetch(solrParams, false).getStatus(); // this is required because this callable can race with HttpSolrCall#destroy // which clears the request info. diff --git a/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java b/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java index b9d9f51..281e660 100644 — a/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java +++ b/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java @@ -106,6 +106,8 @@ import static org.apache.solr.common.params.CommonParams.JAVABIN; import static org.apache.solr.common.params.CommonParams.NAME; import static org.apache.solr.handler.ReplicationHandler.*; +import com.google.common.base.Strings; + /** <p> Provides functionality of downloading changed index files as well as config files and a timer for scheduling fetches from the master. </p> @@ -161,6 +163,52 @@ public class IndexFetcher { private Integer soTimeout; + private static final String INTERRUPT_RESPONSE_MESSAGE = "Interrupted while waiting for modify lock"; + + public static class IndexFetchResult { + private final String message; + private final boolean status; + private final Throwable exception; + + public static final String FAILED_BY_INTERRUPT_MESSAGE = "Fetching index failed by interrupt"; + public static final String FAILED_BY_EXCEPTION_MESSAGE = "Fetching index failed by exception"; + + /** pre-defined results */ + public static final IndexFetchResult ALREADY_IN_SYNC = new IndexFetchResult("Local index commit is already in sync with peer", true, null); + public static final IndexFetchResult INDEX_FETCH_FAILURE = new IndexFetchResult("Fetching lastest index is failed", false, null); + public static final IndexFetchResult INDEX_FETCH_SUCCESS = new IndexFetchResult("Fetching latest index is successful", true, null); + public static final IndexFetchResult LOCK_OBTAIN_FAILED = new IndexFetchResult("Obtaining SnapPuller lock failed", false, null); + public static final IndexFetchResult MASTER_VERSION_ZERO = new IndexFetchResult("Index in peer is empty and never committed yet", true, null); + public static final IndexFetchResult NO_INDEX_COMMIT_EXIST = new IndexFetchResult("No IndexCommit in local index", false, null); + public static final IndexFetchResult PEER_INDEX_COMMIT_DELETED = new IndexFetchResult("No files to download because IndexCommit in peer was deleted", false, null); + // SFDC: adding a new failure result when replication is aborted because of local activity + public static final IndexFetchResult LOCAL_ACTIVITY_DURING_REPLICATION = new IndexFetchResult("Local index modification during replication", false, null); + + IndexFetchResult(String message, boolean status, Throwable exception) { + this.message = message; + this.status = status; + this.exception = exception; + } + + /* + * @return exception thrown if failed by exception or interrupt, otherwise null + */ + public Throwable getException() { + return this.exception; + } + + /* + * @return true if index fetch was successful, false otherwise + */ + public boolean getStatus() { + return this.status; + } + + public String getMessage() { + return this.message; + } + } + private static HttpClient createHttpClient(SolrCore core, String httpBasicAuthUser, String httpBasicAuthPassword, boolean useCompression) { final ModifiableSolrParams httpClientParams = new ModifiableSolrParams(); httpClientParams.set(HttpClientUtil.PROP_BASIC_AUTH_USER, httpBasicAuthUser); @@ -264,7 +312,7 @@ public class IndexFetcher { } } boolean fetchLatestIndex(boolean forceReplication) throws IOException, InterruptedException { + IndexFetchResult fetchLatestIndex(boolean forceReplication) throws IOException, InterruptedException { return fetchLatestIndex(forceReplication, false); } @@ -277,7 +325,7 @@ public class IndexFetcher { @return true on success, false if slave is already in sync @throws IOException if an exception occurs */ boolean fetchLatestIndex(boolean forceReplication, boolean forceCoreReload) throws IOException, InterruptedException { + IndexFetchResult fetchLatestIndex(boolean forceReplication, boolean forceCoreReload) throws IOException, InterruptedException { boolean cleanupDone = false; boolean successfulInstall = false; @@ -302,9 +350,16 @@ public class IndexFetcher { try { response = getLatestVersion(); } catch (Exception e) { - LOG.error("Master at: " + masterUrl + " is not available. Index fetch failed. Exception: " + e.getMessage()); - return false; - } + final String errorMsg = e.getMessage(); + if (!Strings.isNullOrEmpty(errorMsg) && errorMsg.contains(INTERRUPT_RESPONSE_MESSAGE)) { + LOG.warn("Master at: " + masterUrl + " is not available. Index fetch failed by interrupt. Exception: " + errorMsg); + return new IndexFetchResult(IndexFetchResult.FAILED_BY_INTERRUPT_MESSAGE, false, e); + } else { + LOG.warn("Master at: " + masterUrl + " is not available. Index fetch failed by exception: " + errorMsg); + return new IndexFetchResult(IndexFetchResult.FAILED_BY_EXCEPTION_MESSAGE, false, e); + } + } + long latestVersion = (Long) response.get(CMD_INDEX_VERSION); long latestGeneration = (Long) response.get(GENERATION); @@ -320,7 +375,7 @@ public class IndexFetcher { searcherRefCounted = solrCore.getNewestSearcher(false); if (searcherRefCounted == null) { LOG.warn("No open searcher found - fetch aborted"); - return false; + return IndexFetchResult.NO_INDEX_COMMIT_EXIST; } commit = searcherRefCounted.get().getIndexReader().getIndexCommit(); } finally { @@ -347,7 +402,7 @@ public class IndexFetcher { //there is nothing to be replicated successfulInstall = true; - return true; + return IndexFetchResult.MASTER_VERSION_ZERO; } // TODO: Should we be comparing timestamps (across machines) here? @@ -355,14 +410,14 @@ public class IndexFetcher { //master and slave are already in sync just return LOG.info("Slave in sync with master."); successfulInstall = true; - return true; + return IndexFetchResult.ALREADY_IN_SYNC; } LOG.info("Starting replication process"); // get the list of files first fetchFileList(latestGeneration); // this can happen if the commit point is deleted before we fetch the file list. if (filesToDownload.isEmpty()) { - return false; + return IndexFetchResult.PEER_INDEX_COMMIT_DELETED; } LOG.info("Number of files in latest index in master: " + filesToDownload.size()); if (tlogFilesToDownload != null) { @@ -550,14 +605,14 @@ public class IndexFetcher { LOG.warn( "Replication attempt was not successful - trying a full index replication reloadCore={}", reloadCore); successfulInstall = fetchLatestIndex(true, reloadCore); + successfulInstall = fetchLatestIndex(true, reloadCore).getStatus(); } markReplicationStop(); return successfulInstall; + return successfulInstall ? IndexFetchResult.INDEX_FETCH_SUCCESS : IndexFetchResult.INDEX_FETCH_FAILURE; } catch (ReplicationHandlerException e) { LOG.error("User aborted Replication"); - return false; + return new IndexFetchResult(IndexFetchResult.FAILED_BY_EXCEPTION_MESSAGE, false, e); } catch (SolrException e) { throw e; } catch (InterruptedException e) { diff --git a/solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java b/solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java index 84e1ba2..becec83 100644 a/solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java +++ b/solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java @@ -88,6 +88,7 @@ import org.apache.solr.core.SolrEventListener; import org.apache.solr.core.backup.repository.BackupRepository; import org.apache.solr.core.backup.repository.LocalFileSystemRepository; import org.apache.solr.core.snapshots.SolrSnapshotMetaDataManager; +import org.apache.solr.handler.IndexFetcher.IndexFetchResult; import org.apache.solr.request.SolrQueryRequest; import org.apache.solr.response.SolrQueryResponse; import org.apache.solr.search.SolrIndexSearcher; @@ -381,10 +382,10 @@ public class ReplicationHandler extends RequestHandlerBase implements SolrCoreAw private volatile IndexFetcher currentIndexFetcher; public boolean doFetch(SolrParams solrParams, boolean forceReplication) { + public IndexFetchResult doFetch(SolrParams solrParams, boolean forceReplication) { String masterUrl = solrParams == null ? null : solrParams.get(MASTER_URL); if (!indexFetchLock.tryLock()) return false; + return IndexFetchResult.LOCK_OBTAIN_FAILED; try { if (masterUrl != null) { if (currentIndexFetcher != null && currentIndexFetcher != pollingIndexFetcher) Unknown macro: {@@ -400,17 +401,16 @@ public class ReplicationHandler extends RequestHandlerBase implements SolrCoreAw if (currentIndexFetcher != pollingIndexFetcher) { currentIndexFetcher.destroy(); }+ return new IndexFetchResult(IndexFetchResult.FAILED_BY_EXCEPTION_MESSAGE, false, e); } finally { if (pollingIndexFetcher != null) { if( currentIndexFetcher != pollingIndexFetcher) { currentIndexFetcher.destroy(); } currentIndexFetcher = pollingIndexFetcher; } indexFetchLock.unlock(); } return false; } boolean isReplicating() {
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Moving to 6.5, since 6.4 has already been released.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Moving to 6.5, since 6.4 has already been released.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dsmiley commented on the issue:

          https://github.com/apache/lucene-solr/pull/169

          @millerjeff0 when you created this PR, you didn't put SOLR-10249 in in the title and therefore the PR and the Jira issue aren't linked. Can you edit it and do that please?

          Show
          githubbot ASF GitHub Bot added a comment - Github user dsmiley commented on the issue: https://github.com/apache/lucene-solr/pull/169 @millerjeff0 when you created this PR, you didn't put SOLR-10249 in in the title and therefore the PR and the Jira issue aren't linked. Can you edit it and do that please?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user millerjeff0 commented on the issue:

          https://github.com/apache/lucene-solr/pull/169

          Thanks I was not aware I needed to do so, fixed it

          Show
          githubbot ASF GitHub Bot added a comment - Github user millerjeff0 commented on the issue: https://github.com/apache/lucene-solr/pull/169 Thanks I was not aware I needed to do so, fixed it
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dsmiley commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/169#discussion_r107513516

          — Diff: solr/core/src/java/org/apache/solr/handler/IndexFetcher.java —
          @@ -321,9 +369,16 @@ boolean fetchLatestIndex(boolean forceReplication, boolean forceCoreReload) thro
          try

          { response = getLatestVersion(); }

          catch (Exception e)

          { - LOG.error("Master at: " + masterUrl + " is not available. Index fetch failed. Exception: " + e.getMessage()); - return false; - }

          + final String errorMsg = e.getMessage();
          — End diff –

          Granted e.getMessage() was what this code was doing before you made changes but can you please use e.toString() instead? IMO, e.getMessage() should rarely if ever be called instead of e.toString() because e.toString() critically contains the name of the exception itself. The "message" alone can be unclear without the exception class. Consider `FileNotFoundException` and there are others.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/169#discussion_r107513516 — Diff: solr/core/src/java/org/apache/solr/handler/IndexFetcher.java — @@ -321,9 +369,16 @@ boolean fetchLatestIndex(boolean forceReplication, boolean forceCoreReload) thro try { response = getLatestVersion(); } catch (Exception e) { - LOG.error("Master at: " + masterUrl + " is not available. Index fetch failed. Exception: " + e.getMessage()); - return false; - } + final String errorMsg = e.getMessage(); — End diff – Granted e.getMessage() was what this code was doing before you made changes but can you please use e.toString() instead? IMO, e.getMessage() should rarely if ever be called instead of e.toString() because e.toString() critically contains the name of the exception itself. The "message" alone can be unclear without the exception class. Consider `FileNotFoundException` and there are others.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dsmiley commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/169#discussion_r107514170

          — Diff: solr/core/src/java/org/apache/solr/handler/IndexFetcher.java —
          @@ -161,6 +163,52 @@

          private Integer soTimeout;

          + private static final String INTERRUPT_RESPONSE_MESSAGE = "Interrupted while waiting for modify lock";
          +
          + public static class IndexFetchResult {
          — End diff –

          I really like how you modeled this return object.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/169#discussion_r107514170 — Diff: solr/core/src/java/org/apache/solr/handler/IndexFetcher.java — @@ -161,6 +163,52 @@ private Integer soTimeout; + private static final String INTERRUPT_RESPONSE_MESSAGE = "Interrupted while waiting for modify lock"; + + public static class IndexFetchResult { — End diff – I really like how you modeled this return object.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dsmiley commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/169#discussion_r107512860

          — Diff: solr/core/src/java/org/apache/solr/handler/IndexFetcher.java —
          @@ -311,7 +359,7 @@ boolean fetchLatestIndex(boolean forceReplication, boolean forceCoreReload) thro
          Replica replica = getLeaderReplica();
          CloudDescriptor cd = solrCore.getCoreDescriptor().getCloudDescriptor();
          if (cd.getCoreNodeName().equals(replica.getName())) {

          • return false;
            + return IndexFetchResult.CORE_NODE_IS_REPLICA;
              • End diff –

          Maybe call this NOT_LEADER ? Even a leader is a replica (it's the role a replica plays). Granted many of us promote this ambiguity because we haven't clearly/consistently labelled a non-leader, e.g. a follower.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/169#discussion_r107512860 — Diff: solr/core/src/java/org/apache/solr/handler/IndexFetcher.java — @@ -311,7 +359,7 @@ boolean fetchLatestIndex(boolean forceReplication, boolean forceCoreReload) thro Replica replica = getLeaderReplica(); CloudDescriptor cd = solrCore.getCoreDescriptor().getCloudDescriptor(); if (cd.getCoreNodeName().equals(replica.getName())) { return false; + return IndexFetchResult.CORE_NODE_IS_REPLICA; End diff – Maybe call this NOT_LEADER ? Even a leader is a replica (it's the role a replica plays). Granted many of us promote this ambiguity because we haven't clearly/consistently labelled a non-leader, e.g. a follower.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dsmiley commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/169#discussion_r107514287

          — Diff: solr/core/src/java/org/apache/solr/handler/IndexFetcher.java —
          @@ -161,6 +163,52 @@

          private Integer soTimeout;

          + private static final String INTERRUPT_RESPONSE_MESSAGE = "Interrupted while waiting for modify lock";
          +
          + public static class IndexFetchResult {
          + private final String message;
          + private final boolean status;
          — End diff –

          perhaps rename "status" to "successful" (getter too)?

          Show
          githubbot ASF GitHub Bot added a comment - Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/169#discussion_r107514287 — Diff: solr/core/src/java/org/apache/solr/handler/IndexFetcher.java — @@ -161,6 +163,52 @@ private Integer soTimeout; + private static final String INTERRUPT_RESPONSE_MESSAGE = "Interrupted while waiting for modify lock"; + + public static class IndexFetchResult { + private final String message; + private final boolean status; — End diff – perhaps rename "status" to "successful" (getter too)?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user millerjeff0 commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/169#discussion_r107526774

          — Diff: solr/core/src/java/org/apache/solr/handler/IndexFetcher.java —
          @@ -321,9 +369,16 @@ boolean fetchLatestIndex(boolean forceReplication, boolean forceCoreReload) thro
          try

          { response = getLatestVersion(); }

          catch (Exception e)

          { - LOG.error("Master at: " + masterUrl + " is not available. Index fetch failed. Exception: " + e.getMessage()); - return false; - }

          + final String errorMsg = e.getMessage();
          — End diff –

          Great point, thanks

          Show
          githubbot ASF GitHub Bot added a comment - Github user millerjeff0 commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/169#discussion_r107526774 — Diff: solr/core/src/java/org/apache/solr/handler/IndexFetcher.java — @@ -321,9 +369,16 @@ boolean fetchLatestIndex(boolean forceReplication, boolean forceCoreReload) thro try { response = getLatestVersion(); } catch (Exception e) { - LOG.error("Master at: " + masterUrl + " is not available. Index fetch failed. Exception: " + e.getMessage()); - return false; - } + final String errorMsg = e.getMessage(); — End diff – Great point, thanks
          Hide
          dsmiley David Smiley added a comment -

          Here's the patch. All tests pass and precommit. I'll commit later today.

          Thanks for the contribution Jeff.

          Show
          dsmiley David Smiley added a comment - Here's the patch. All tests pass and precommit. I'll commit later today. Thanks for the contribution Jeff.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tflobbe commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/169#discussion_r107954150

          — Diff: solr/core/src/java/org/apache/solr/handler/IndexFetcher.java —
          @@ -161,6 +163,52 @@

          private Integer soTimeout;

          + private static final String INTERRUPT_RESPONSE_MESSAGE = "Interrupted while waiting for modify lock";
          +
          + public static class IndexFetchResult {
          + private final String message;
          + private final boolean successful;
          + private final Throwable exception;
          +
          + public static final String FAILED_BY_INTERRUPT_MESSAGE = "Fetching index failed by interrupt";
          + public static final String FAILED_BY_EXCEPTION_MESSAGE = "Fetching index failed by exception";
          +
          + /** pre-defined results */
          + public static final IndexFetchResult ALREADY_IN_SYNC = new IndexFetchResult("Local index commit is already in sync with peer", true, null);
          + public static final IndexFetchResult INDEX_FETCH_FAILURE = new IndexFetchResult("Fetching lastest index is failed", false, null);
          + public static final IndexFetchResult INDEX_FETCH_SUCCESS = new IndexFetchResult("Fetching latest index is successful", true, null);
          + public static final IndexFetchResult LOCK_OBTAIN_FAILED = new IndexFetchResult("Obtaining SnapPuller lock failed", false, null);
          + public static final IndexFetchResult MASTER_VERSION_ZERO = new IndexFetchResult("Index in peer is empty and never committed yet", true, null);
          + public static final IndexFetchResult NO_INDEX_COMMIT_EXIST = new IndexFetchResult("No IndexCommit in local index", false, null);
          + public static final IndexFetchResult PEER_INDEX_COMMIT_DELETED = new IndexFetchResult("No files to download because IndexCommit in peer was deleted", false, null);
          + public static final IndexFetchResult LOCAL_ACTIVITY_DURING_REPLICATION = new IndexFetchResult("Local index modification during replication", false, null);
          + public static final IndexFetchResult CORE_NODE_IS_NOT_LEADER = new IndexFetchResult("Core Name Name Equals Leader Replica Name", false, null);
          — End diff –

          This message (and variable name) seems wrong. Looking at the case where this is thrown, it seems like this would happen if "OnlyLeaderIndexes" feature is being used, and this replica is the current leader. Message should be something like"Replicating from leader but I'm the shard leader"

          Show
          githubbot ASF GitHub Bot added a comment - Github user tflobbe commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/169#discussion_r107954150 — Diff: solr/core/src/java/org/apache/solr/handler/IndexFetcher.java — @@ -161,6 +163,52 @@ private Integer soTimeout; + private static final String INTERRUPT_RESPONSE_MESSAGE = "Interrupted while waiting for modify lock"; + + public static class IndexFetchResult { + private final String message; + private final boolean successful; + private final Throwable exception; + + public static final String FAILED_BY_INTERRUPT_MESSAGE = "Fetching index failed by interrupt"; + public static final String FAILED_BY_EXCEPTION_MESSAGE = "Fetching index failed by exception"; + + /** pre-defined results */ + public static final IndexFetchResult ALREADY_IN_SYNC = new IndexFetchResult("Local index commit is already in sync with peer", true, null); + public static final IndexFetchResult INDEX_FETCH_FAILURE = new IndexFetchResult("Fetching lastest index is failed", false, null); + public static final IndexFetchResult INDEX_FETCH_SUCCESS = new IndexFetchResult("Fetching latest index is successful", true, null); + public static final IndexFetchResult LOCK_OBTAIN_FAILED = new IndexFetchResult("Obtaining SnapPuller lock failed", false, null); + public static final IndexFetchResult MASTER_VERSION_ZERO = new IndexFetchResult("Index in peer is empty and never committed yet", true, null); + public static final IndexFetchResult NO_INDEX_COMMIT_EXIST = new IndexFetchResult("No IndexCommit in local index", false, null); + public static final IndexFetchResult PEER_INDEX_COMMIT_DELETED = new IndexFetchResult("No files to download because IndexCommit in peer was deleted", false, null); + public static final IndexFetchResult LOCAL_ACTIVITY_DURING_REPLICATION = new IndexFetchResult("Local index modification during replication", false, null); + public static final IndexFetchResult CORE_NODE_IS_NOT_LEADER = new IndexFetchResult("Core Name Name Equals Leader Replica Name", false, null); — End diff – This message (and variable name) seems wrong. Looking at the case where this is thrown, it seems like this would happen if "OnlyLeaderIndexes" feature is being used, and this replica is the current leader. Message should be something like"Replicating from leader but I'm the shard leader"
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dsmiley commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/169#discussion_r107955318

          — Diff: solr/core/src/java/org/apache/solr/handler/IndexFetcher.java —
          @@ -161,6 +163,52 @@

          private Integer soTimeout;

          + private static final String INTERRUPT_RESPONSE_MESSAGE = "Interrupted while waiting for modify lock";
          +
          + public static class IndexFetchResult {
          + private final String message;
          + private final boolean successful;
          + private final Throwable exception;
          +
          + public static final String FAILED_BY_INTERRUPT_MESSAGE = "Fetching index failed by interrupt";
          + public static final String FAILED_BY_EXCEPTION_MESSAGE = "Fetching index failed by exception";
          +
          + /** pre-defined results */
          + public static final IndexFetchResult ALREADY_IN_SYNC = new IndexFetchResult("Local index commit is already in sync with peer", true, null);
          + public static final IndexFetchResult INDEX_FETCH_FAILURE = new IndexFetchResult("Fetching lastest index is failed", false, null);
          + public static final IndexFetchResult INDEX_FETCH_SUCCESS = new IndexFetchResult("Fetching latest index is successful", true, null);
          + public static final IndexFetchResult LOCK_OBTAIN_FAILED = new IndexFetchResult("Obtaining SnapPuller lock failed", false, null);
          + public static final IndexFetchResult MASTER_VERSION_ZERO = new IndexFetchResult("Index in peer is empty and never committed yet", true, null);
          + public static final IndexFetchResult NO_INDEX_COMMIT_EXIST = new IndexFetchResult("No IndexCommit in local index", false, null);
          + public static final IndexFetchResult PEER_INDEX_COMMIT_DELETED = new IndexFetchResult("No files to download because IndexCommit in peer was deleted", false, null);
          + public static final IndexFetchResult LOCAL_ACTIVITY_DURING_REPLICATION = new IndexFetchResult("Local index modification during replication", false, null);
          + public static final IndexFetchResult CORE_NODE_IS_NOT_LEADER = new IndexFetchResult("Core Name Name Equals Leader Replica Name", false, null);
          — End diff –

          Thanks @tflobbe . I can make this simple change before committing today. Perhaps the constant should be SELF_LEADER_REPLICATE or EXPECTING_NON_LEADER ? Be my guest to make suggestions.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/169#discussion_r107955318 — Diff: solr/core/src/java/org/apache/solr/handler/IndexFetcher.java — @@ -161,6 +163,52 @@ private Integer soTimeout; + private static final String INTERRUPT_RESPONSE_MESSAGE = "Interrupted while waiting for modify lock"; + + public static class IndexFetchResult { + private final String message; + private final boolean successful; + private final Throwable exception; + + public static final String FAILED_BY_INTERRUPT_MESSAGE = "Fetching index failed by interrupt"; + public static final String FAILED_BY_EXCEPTION_MESSAGE = "Fetching index failed by exception"; + + /** pre-defined results */ + public static final IndexFetchResult ALREADY_IN_SYNC = new IndexFetchResult("Local index commit is already in sync with peer", true, null); + public static final IndexFetchResult INDEX_FETCH_FAILURE = new IndexFetchResult("Fetching lastest index is failed", false, null); + public static final IndexFetchResult INDEX_FETCH_SUCCESS = new IndexFetchResult("Fetching latest index is successful", true, null); + public static final IndexFetchResult LOCK_OBTAIN_FAILED = new IndexFetchResult("Obtaining SnapPuller lock failed", false, null); + public static final IndexFetchResult MASTER_VERSION_ZERO = new IndexFetchResult("Index in peer is empty and never committed yet", true, null); + public static final IndexFetchResult NO_INDEX_COMMIT_EXIST = new IndexFetchResult("No IndexCommit in local index", false, null); + public static final IndexFetchResult PEER_INDEX_COMMIT_DELETED = new IndexFetchResult("No files to download because IndexCommit in peer was deleted", false, null); + public static final IndexFetchResult LOCAL_ACTIVITY_DURING_REPLICATION = new IndexFetchResult("Local index modification during replication", false, null); + public static final IndexFetchResult CORE_NODE_IS_NOT_LEADER = new IndexFetchResult("Core Name Name Equals Leader Replica Name", false, null); — End diff – Thanks @tflobbe . I can make this simple change before committing today. Perhaps the constant should be SELF_LEADER_REPLICATE or EXPECTING_NON_LEADER ? Be my guest to make suggestions.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tflobbe commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/169#discussion_r107959546

          — Diff: solr/core/src/java/org/apache/solr/handler/IndexFetcher.java —
          @@ -161,6 +163,52 @@

          private Integer soTimeout;

          + private static final String INTERRUPT_RESPONSE_MESSAGE = "Interrupted while waiting for modify lock";
          +
          + public static class IndexFetchResult {
          + private final String message;
          + private final boolean successful;
          + private final Throwable exception;
          +
          + public static final String FAILED_BY_INTERRUPT_MESSAGE = "Fetching index failed by interrupt";
          + public static final String FAILED_BY_EXCEPTION_MESSAGE = "Fetching index failed by exception";
          +
          + /** pre-defined results */
          + public static final IndexFetchResult ALREADY_IN_SYNC = new IndexFetchResult("Local index commit is already in sync with peer", true, null);
          + public static final IndexFetchResult INDEX_FETCH_FAILURE = new IndexFetchResult("Fetching lastest index is failed", false, null);
          + public static final IndexFetchResult INDEX_FETCH_SUCCESS = new IndexFetchResult("Fetching latest index is successful", true, null);
          + public static final IndexFetchResult LOCK_OBTAIN_FAILED = new IndexFetchResult("Obtaining SnapPuller lock failed", false, null);
          + public static final IndexFetchResult MASTER_VERSION_ZERO = new IndexFetchResult("Index in peer is empty and never committed yet", true, null);
          + public static final IndexFetchResult NO_INDEX_COMMIT_EXIST = new IndexFetchResult("No IndexCommit in local index", false, null);
          + public static final IndexFetchResult PEER_INDEX_COMMIT_DELETED = new IndexFetchResult("No files to download because IndexCommit in peer was deleted", false, null);
          + public static final IndexFetchResult LOCAL_ACTIVITY_DURING_REPLICATION = new IndexFetchResult("Local index modification during replication", false, null);
          + public static final IndexFetchResult CORE_NODE_IS_NOT_LEADER = new IndexFetchResult("Core Name Name Equals Leader Replica Name", false, null);
          — End diff –

          Yes, those sound good. I personally like *EXPECTING_NON_LEADER* better

          Show
          githubbot ASF GitHub Bot added a comment - Github user tflobbe commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/169#discussion_r107959546 — Diff: solr/core/src/java/org/apache/solr/handler/IndexFetcher.java — @@ -161,6 +163,52 @@ private Integer soTimeout; + private static final String INTERRUPT_RESPONSE_MESSAGE = "Interrupted while waiting for modify lock"; + + public static class IndexFetchResult { + private final String message; + private final boolean successful; + private final Throwable exception; + + public static final String FAILED_BY_INTERRUPT_MESSAGE = "Fetching index failed by interrupt"; + public static final String FAILED_BY_EXCEPTION_MESSAGE = "Fetching index failed by exception"; + + /** pre-defined results */ + public static final IndexFetchResult ALREADY_IN_SYNC = new IndexFetchResult("Local index commit is already in sync with peer", true, null); + public static final IndexFetchResult INDEX_FETCH_FAILURE = new IndexFetchResult("Fetching lastest index is failed", false, null); + public static final IndexFetchResult INDEX_FETCH_SUCCESS = new IndexFetchResult("Fetching latest index is successful", true, null); + public static final IndexFetchResult LOCK_OBTAIN_FAILED = new IndexFetchResult("Obtaining SnapPuller lock failed", false, null); + public static final IndexFetchResult MASTER_VERSION_ZERO = new IndexFetchResult("Index in peer is empty and never committed yet", true, null); + public static final IndexFetchResult NO_INDEX_COMMIT_EXIST = new IndexFetchResult("No IndexCommit in local index", false, null); + public static final IndexFetchResult PEER_INDEX_COMMIT_DELETED = new IndexFetchResult("No files to download because IndexCommit in peer was deleted", false, null); + public static final IndexFetchResult LOCAL_ACTIVITY_DURING_REPLICATION = new IndexFetchResult("Local index modification during replication", false, null); + public static final IndexFetchResult CORE_NODE_IS_NOT_LEADER = new IndexFetchResult("Core Name Name Equals Leader Replica Name", false, null); — End diff – Yes, those sound good. I personally like * EXPECTING_NON_LEADER * better
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 8664f1f38a3741a7aff5988221cb7a1a7dda9e5b in lucene-solr's branch refs/heads/master from David Smiley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8664f1f ]

          SOLR-10249: Refactor IndexFetcher to return detailed result

          Show
          jira-bot ASF subversion and git services added a comment - Commit 8664f1f38a3741a7aff5988221cb7a1a7dda9e5b in lucene-solr's branch refs/heads/master from David Smiley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8664f1f ] SOLR-10249 : Refactor IndexFetcher to return detailed result
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit bd53fafc567d393dcf5318dbc99da3fd13177cc2 in lucene-solr's branch refs/heads/branch_6x from David Smiley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bd53faf ]

          SOLR-10249: Refactor IndexFetcher to return detailed result

          (cherry picked from commit 8664f1f)

          Show
          jira-bot ASF subversion and git services added a comment - Commit bd53fafc567d393dcf5318dbc99da3fd13177cc2 in lucene-solr's branch refs/heads/branch_6x from David Smiley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bd53faf ] SOLR-10249 : Refactor IndexFetcher to return detailed result (cherry picked from commit 8664f1f)

            People

            • Assignee:
              dsmiley David Smiley
              Reporter:
              millerjeff0 Jeff Miller
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 3h
                3h
                Remaining:
                Remaining Estimate - 3h
                3h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development