Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9954

SnapShooter createSnapshot can swallow an exception raised by the underlying backup repo

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 6.2.1, 6.3
    • Fix Version/s: 6.4, 7.0
    • Component/s: Hadoop Integration
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      While configuring the HdfsBackupRepository to use Google compute storage, I misconfigured the permissions on my bucket. Unfortunately, the exception that would have pointed me in the right direction gets squelched by the finally block in createSnapshot:

          } finally {
            if (!success) {
              backupRepo.deleteDirectory(snapshotDirPath);
            }
          }
      

      If there's a permissions issue, then the deleteDelectory is going to fail and raise another exception from the finally block, which swallows the original exception. For example:

      ERROR - 2017-01-10 18:38:52.650; [c:gettingstarted s:shard1 r:core_node1 x:gettingstarted_shard1_replica1] org.apache.solr.handler.SnapShooter; Exception while creating snapshot
      java.io.IOException: GoogleHadoopFileSystem has been closed or not initialized.
          at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.checkOpen(GoogleHadoopFileSystemBase.java:1927)
          at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.delete(GoogleHadoopFileSystemBase.java:1255)
          at org.apache.solr.core.backup.repository.HdfsBackupRepository.deleteDirectory(HdfsBackupRepository.java:160)
          at org.apache.solr.handler.SnapShooter.createSnapshot(SnapShooter.java:234)
          at org.apache.solr.handler.SnapShooter.lambda$createSnapAsync$1(SnapShooter.java:186)
          at org.apache.solr.handler.SnapShooter$$Lambda$89/43739789.run(Unknown Source)
          at java.lang.Thread.run(Thread.java:745)
      

      That's merely the symptom and not the actual cause of the failure.

      1. SOLR-9954.patch
        0.7 kB
        Timothy Potter

        Issue Links

          Activity

          Hide
          thelabdude Timothy Potter added a comment - - edited

          Here's a patch (against 6.2.1 tag) that logs the delete as a warning and allows the actual exception to propagate out of this method correctly. I'll work a PR through to 6x from master...

          Show
          thelabdude Timothy Potter added a comment - - edited Here's a patch (against 6.2.1 tag) that logs the delete as a warning and allows the actual exception to propagate out of this method correctly. I'll work a PR through to 6x from master...
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user thelabdude opened a pull request:

          https://github.com/apache/lucene-solr/pull/137

          SOLR-9954: Prevent against failure during failed snapshot cleanup fro…

          …m swallowing the actual cause for the snapshot to fail.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/apache/lucene-solr jira/solr-9954

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucene-solr/pull/137.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #137


          commit 4856550b45548f376b6292aeef4d501fd3d85fd2
          Author: Timothy Potter <thelabdude@gmail.com>
          Date: 2017-01-11T00:33:50Z

          SOLR-9954: Prevent against failure during failed snapshot cleanup from swallowing the actual cause for the snapshot to fail.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user thelabdude opened a pull request: https://github.com/apache/lucene-solr/pull/137 SOLR-9954 : Prevent against failure during failed snapshot cleanup fro… …m swallowing the actual cause for the snapshot to fail. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/lucene-solr jira/solr-9954 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/137.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #137 commit 4856550b45548f376b6292aeef4d501fd3d85fd2 Author: Timothy Potter <thelabdude@gmail.com> Date: 2017-01-11T00:33:50Z SOLR-9954 : Prevent against failure during failed snapshot cleanup from swallowing the actual cause for the snapshot to fail.
          Hide
          thelabdude Timothy Potter added a comment -

          I'd like to include this into 6.4 -> https://github.com/apache/lucene-solr/pull/137

          Show
          thelabdude Timothy Potter added a comment - I'd like to include this into 6.4 -> https://github.com/apache/lucene-solr/pull/137
          Hide
          varunthacker Varun Thacker added a comment -

          +1 for the patch

          Show
          varunthacker Varun Thacker added a comment - +1 for the patch
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 118fc422d0cff8492db99edccb3d73068cf04b52 in lucene-solr's branch refs/heads/master from Timothy Potter
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=118fc42 ]

          SOLR-9954: Prevent against failure during failed snapshot cleanup from swallowing the actual cause for the snapshot to fail.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 118fc422d0cff8492db99edccb3d73068cf04b52 in lucene-solr's branch refs/heads/master from Timothy Potter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=118fc42 ] SOLR-9954 : Prevent against failure during failed snapshot cleanup from swallowing the actual cause for the snapshot to fail.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f36a493d55bb9ed5676710146dcf3c51c7983ea6 in lucene-solr's branch refs/heads/branch_6x from Timothy Potter
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f36a493 ]

          SOLR-9954: Prevent against failure during failed snapshot cleanup from swallowing the actual cause for the snapshot to fail.

          Show
          jira-bot ASF subversion and git services added a comment - Commit f36a493d55bb9ed5676710146dcf3c51c7983ea6 in lucene-solr's branch refs/heads/branch_6x from Timothy Potter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f36a493 ] SOLR-9954 : Prevent against failure during failed snapshot cleanup from swallowing the actual cause for the snapshot to fail.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user thelabdude closed the pull request at:

          https://github.com/apache/lucene-solr/pull/137

          Show
          githubbot ASF GitHub Bot added a comment - Github user thelabdude closed the pull request at: https://github.com/apache/lucene-solr/pull/137
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Timothy Potter Varun Thacker Should we check file-permissions upfront ? Seems better from usability perspective...

          Show
          hgadre Hrishikesh Gadre added a comment - Timothy Potter Varun Thacker Should we check file-permissions upfront ? Seems better from usability perspective...

            People

            • Assignee:
              thelabdude Timothy Potter
              Reporter:
              thelabdude Timothy Potter
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development