Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-7378

Be more conservative about loading a core when hdfs transaction log could not be recovered

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 5.0
    • None
    • None

    Description

      Today, if an HdfsTransactionLog cannot recover its lease, you get the following warning in the log:

            log.warn("Cannot recoverLease after trying for " +
              conf.getInt("solr.hdfs.lease.recovery.timeout", 900000) +
              "ms (solr.hdfs.lease.recovery.timeout); continuing, but may be DATALOSS!!!; " +
              getLogMessageDetail(nbAttempt, p, startWaiting));
      

      from: https://github.com/apache/lucene-solr/blob/a8c24b7f02d4e4c172926d04654bcc007f6c29d2/solr/core/src/java/org/apache/solr/util/FSHDFSUtils.java#L145-L148

      But some deployments may not actually want to continue if there is potential data loss, they may want to investigate what the underlying issue is with HDFS first. And there's no way outside of looking at the logs to figure out what is going on.

      There's a range of possibilties here, but here's a couple of ideas:
      1) config parameter around whether to continue with potential data loss or not
      2) load but require special flag to read potentially incorrect data (similar to shards.tolerant, data.tolerant or something?)

      Attachments

        Activity

          People

            Unassigned Unassigned
            gchanan Gregory Chanan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: