Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.1
    • Fix Version/s: 4.3.1, 4.4, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Failure to dereference tlogs or RecentUpdates can cause old transaction logs to never be closed & deleted.

      1. SOLR-4829.patch
        6 kB
        Yonik Seeley
      2. SOLR-4829.patch
        4 kB
        Yonik Seeley
      3. SOLR-4829.patch
        4 kB
        Yonik Seeley

        Issue Links

          Activity

          Hide
          Yonik Seeley added a comment -

          After a code review, one source leak is in ElectionContext.java:

                if (!success && ulog.getRecentUpdates().getVersions(1).isEmpty()) {
          

          introduced in SOLR-3933 (Solr 4.1)

          Show
          Yonik Seeley added a comment - After a code review, one source leak is in ElectionContext.java: if (!success && ulog.getRecentUpdates().getVersions(1).isEmpty()) { introduced in SOLR-3933 (Solr 4.1)
          Hide
          Mark Miller added a comment -

          It's actually SOLR-3939

          Show
          Mark Miller added a comment - It's actually SOLR-3939
          Hide
          Mark Miller added a comment -

          This one actually occured to me when i was reading the user thread on this the other day - it didn't seem like the culprit for that guy though because it only happens on election (unless he was losing the leader consistently for some ugly reason).

          Show
          Mark Miller added a comment - This one actually occured to me when i was reading the user thread on this the other day - it didn't seem like the culprit for that guy though because it only happens on election (unless he was losing the leader consistently for some ugly reason).
          Hide
          Yonik Seeley added a comment -

          Here's a patch that should hopefully fix things up wrt getRecentUpdates.

          Show
          Yonik Seeley added a comment - Here's a patch that should hopefully fix things up wrt getRecentUpdates.
          Hide
          Mark Miller added a comment -

          This looks like it reintroduces the NPE you can get with no ulog in ElectionContext - when I put in the null check yesterday or the day before, I was torn between just letting the node become leader if it has no ulog and was active and throwing a specific exception about having no ulog - i ended up choosing the former thinking if we didn't want to support no ulog in solrcloud mode, that should be checked on startup.

          Show
          Mark Miller added a comment - This looks like it reintroduces the NPE you can get with no ulog in ElectionContext - when I put in the null check yesterday or the day before, I was torn between just letting the node become leader if it has no ulog and was active and throwing a specific exception about having no ulog - i ended up choosing the former thinking if we didn't want to support no ulog in solrcloud mode, that should be checked on startup.
          Hide
          Yonik Seeley added a comment -

          This looks like it reintroduces the NPE you can get with no ulog in ElectionContext

          Ah, thanks - I got a merge conflict and then missed your update. I'll fix.

          Show
          Yonik Seeley added a comment - This looks like it reintroduces the NPE you can get with no ulog in ElectionContext Ah, thanks - I got a merge conflict and then missed your update. I'll fix.
          Hide
          Yonik Seeley added a comment -

          Here's another version that cleans up tlog references during tlog recovery in the event an unexpected exception (like a commit throwing something other than IOException).

          Aside: we should keep in mind that when index corruption happens, lucene can throw exceptions other than IOException.

          Show
          Yonik Seeley added a comment - Here's another version that cleans up tlog references during tlog recovery in the event an unexpected exception (like a commit throwing something other than IOException). Aside: we should keep in mind that when index corruption happens, lucene can throw exceptions other than IOException.
          Hide
          Steven Bower added a comment -

          I patched my 4.3.0 install with the attached patch on a solr instance with my broken index that was causing this issue to begin with and i see that the tlog files are limited to 10 files properly and I am not building up orphaned FileDescriptors any more. Additionally I've verified with lsof that the tlog isn't leaking open file...

          Show
          Steven Bower added a comment - I patched my 4.3.0 install with the attached patch on a solr instance with my broken index that was causing this issue to begin with and i see that the tlog files are limited to 10 files properly and I am not building up orphaned FileDescriptors any more. Additionally I've verified with lsof that the tlog isn't leaking open file...
          Hide
          Yonik Seeley added a comment -

          Thanks for verifying Steven!
          Committed to trunk, 4x, 4.3.1

          Show
          Yonik Seeley added a comment - Thanks for verifying Steven! Committed to trunk, 4x, 4.3.1
          Hide
          Shalin Shekhar Mangar added a comment -

          Bulk close after 4.3.1 release

          Show
          Shalin Shekhar Mangar added a comment - Bulk close after 4.3.1 release

            People

            • Assignee:
              Yonik Seeley
              Reporter:
              Yonik Seeley
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development