Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.5.0, 1.5.1
    • Fix Version/s: 1.5.2, 1.6.0
    • Component/s: None
    • Labels:
      None

      Description

      While running the new upgrade script I noticed that a FATE operation failed. I think this was caused by the package name changes in 1.6. However executing FATE ops across an upgrade is probably not safe, its certainly not tested or easy to test. Discussed this on IRC, should probably refuse to upgrade if FATE stack is not empty.

      2014-03-20 18:20:40,724 [fate.Fate] ERROR: Thread "Repo runner 0" died java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.accumulo.server.master.tableOps.TraceRepo
      java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.accumulo.server.master.tableOps.TraceRepo
              at org.apache.accumulo.fate.ZooStore.top(ZooStore.java:266)
              at org.apache.accumulo.fate.AgeOffStore.top(AgeOffStore.java:172)
              at org.apache.accumulo.fate.Fate$TransactionRunner.run(Fate.java:58)
              at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:34)
              at java.lang.Thread.run(Thread.java:701)
      Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.accumulo.server.master.tableOps.TraceRepo
              at org.apache.accumulo.fate.ZooStore.deserialize(ZooStore.java:79)
              at org.apache.accumulo.fate.ZooStore.top(ZooStore.java:262)
              ... 4 more
      Caused by: java.lang.ClassNotFoundException: org.apache.accumulo.server.master.tableOps.TraceRepo
              at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
              at java.security.AccessController.doPrivileged(Native Method)
              at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
              at org.apache.accumulo.start.classloader.AccumuloClassLoader$2.loadClass(AccumuloClassLoader.java:278)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
              at java.lang.Class.forName0(Native Method)
              at java.lang.Class.forName(Class.java:270)
              at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:624)
              at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1611)
              at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1516)
              at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770)
              at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
              at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
              at org.apache.accumulo.fate.ZooStore.deserialize(ZooStore.java:77)
              ... 5 more
      
      

      IRC converstation :

      <busbey> hurm. so how useful would a test set that injects faults into the !METADATA table be?
      <busbey> or into FATE
      <busbey> for that matter
      <busbey> to make sure that we have sufficient failure handling to avoid catastrophic loss
      <kturner> I think I saw a FATE related bug in the logs also
      <kturner> FATE serializes classes and pushes them on a stack in zookeeper
      <kturner> in 1.6 package names were changed, so things could not deserialize
      <busbey> oh boy
      <busbey> that's not good
      <busbey> so like they were serialized while the cluster was 1.5?
      <busbey> and then post upgrade explosions?
      <elserj> sounds like it
      <busbey> were package names changed 1.4 -> 1.5 related to fate?
      <kturner> yep
      <busbey> because in theory
      <busbey> I could have a 1.4 cluster
      <elserj> almost want to preserve classes which were renamed as deprecated
      <busbey> that I upgrade to 1.5 and then 1.6
      <busbey> and I could, in theory not allow enough time for FATE to clear out in the mean
      <busbey> well, or provide some kind of transition jar
      <busbey> that includes classes to allow for burn off
      <busbey> that you could later remove
      <busbey> this sounds like a blocker
      <busbey> barring some kind of documentation we could do
      <busbey> for safely shutting down a cluster in prep for an upgrade
      <busbey> the monitor doesn't show any indicators for waiting FATE operations, does it?
      <kturner> no
      <kturner> maybe 1.6 could refuse to upgrade if the FATE queue is not empty
      <busbey> filed CCUMULO-2517
      <busbey> well
      <busbey> 1) was this also a problem doing 1.4 -> 1.5?
      <busbey> and we just haven't had anyone hit it yet?
      <elserj> do you have an idea of how many renames this introduces, keith?
      <busbey> 2) that sounds like a good idea
      <busbey> as a first check, then just say "please start up the master under PREV_VERSION" and wait for FATE to clear
      <kturner> we could do the same thing for 1.5
      <busbey> with a ref to upgrade notes that explain how to check if FATE is clear?
      <kturner> yeah
      <busbey> that will require we finish ACCUMULO-2469, I presume?
      <busbey> (that's the ticket for documenting how to access zookeeper)
      <busbey> two additional tickets or one?
      <elserj> there's a class that will print fate ops
      <busbey> 1) upgrade instructions should include how to check if there are fate operations pending
      <busbey> 2) upgrade code should refuse to upgrade if there are fae operations pending
      <busbey> nice! we could use that and leave 2469 for later, then?
      <ctubbsii_bot> https://issues.apache.org/jira/browse/ACCUMULO-2469
      <elserj> ctubbsii_bot you need to trim punctuation
      * murraju (~Adium@c-98-230-174-20.hsd1.ga.comcast.net) has joined #accumulo
      <busbey> do those two sound like they cover the FATE bug?
      <busbey> I presume we don't know enough yet to make a call on the delete marker thing?
      <busbey> and that any additional guards on the GC should be aiming for post-1.6?
      <kturner> I am creating a ticket, any problem w/ me just plopping this conversation onto the ticket?
      <busbey> sounds good
      <kturner> elserj?
      <elserj> oh, sure
      

        Issue Links

          Activity

          Hide
          busbey Sean Busbey added a comment -

          updated the versions to reflect that htis also applies to upgrades from 1.4.x with FATE to 1.5.x.

          Show
          busbey Sean Busbey added a comment - updated the versions to reflect that htis also applies to upgrades from 1.4.x with FATE to 1.5.x.
          Hide
          kturner Keith Turner added a comment -

          I was running patch v3 of the upgrade script from ACCUMULO-2145

          Show
          kturner Keith Turner added a comment - I was running patch v3 of the upgrade script from ACCUMULO-2145
          Hide
          busbey Sean Busbey added a comment -

          increasing to Blocker, since the failure mode involves recovery by poking at ZooKeeper, which we don't have documentation for right now

          Show
          busbey Sean Busbey added a comment - increasing to Blocker, since the failure mode involves recovery by poking at ZooKeeper, which we don't have documentation for right now
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 5a504b311c0e5f59ff5b14221c6bf61f43b4d093 in accumulo's branch refs/heads/1.5.2-SNAPSHOT from Sean Busbey
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=5a504b3 ]

          ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 5a504b311c0e5f59ff5b14221c6bf61f43b4d093 in accumulo's branch refs/heads/1.5.2-SNAPSHOT from Sean Busbey [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=5a504b3 ] ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 5a504b311c0e5f59ff5b14221c6bf61f43b4d093 in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Sean Busbey
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=5a504b3 ]

          ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 5a504b311c0e5f59ff5b14221c6bf61f43b4d093 in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Sean Busbey [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=5a504b3 ] ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit e4aa11e1b1a046dec9116273eb57f053aa68fd3f in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Sean Busbey
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=e4aa11e ]

          ACCUMULO-2519 Updates Classes added in 1.6.0 for read only fate changes.

          Show
          jira-bot ASF subversion and git services added a comment - Commit e4aa11e1b1a046dec9116273eb57f053aa68fd3f in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Sean Busbey [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=e4aa11e ] ACCUMULO-2519 Updates Classes added in 1.6.0 for read only fate changes.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 5a504b311c0e5f59ff5b14221c6bf61f43b4d093 in accumulo's branch refs/heads/master from Sean Busbey
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=5a504b3 ]

          ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 5a504b311c0e5f59ff5b14221c6bf61f43b4d093 in accumulo's branch refs/heads/master from Sean Busbey [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=5a504b3 ] ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit e4aa11e1b1a046dec9116273eb57f053aa68fd3f in accumulo's branch refs/heads/master from Sean Busbey
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=e4aa11e ]

          ACCUMULO-2519 Updates Classes added in 1.6.0 for read only fate changes.

          Show
          jira-bot ASF subversion and git services added a comment - Commit e4aa11e1b1a046dec9116273eb57f053aa68fd3f in accumulo's branch refs/heads/master from Sean Busbey [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=e4aa11e ] ACCUMULO-2519 Updates Classes added in 1.6.0 for read only fate changes.
          Hide
          kturner Keith Turner added a comment -

          I think the changes made for this issue fix the problem described in ACCUMULO-2140.

          Show
          kturner Keith Turner added a comment - I think the changes made for this issue fix the problem described in ACCUMULO-2140 .

            People

            • Assignee:
              busbey Sean Busbey
              Reporter:
              kturner Keith Turner
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development