<busbey> hurm. so how useful would a test set that injects faults into the !METADATA table be?
<busbey> or into FATE
<busbey> for that matter
<busbey> to make sure that we have sufficient failure handling to avoid catastrophic loss
<kturner> I think I saw a FATE related bug in the logs also
<kturner> FATE serializes classes and pushes them on a stack in zookeeper
<kturner> in 1.6 package names were changed, so things could not deserialize
<busbey> oh boy
<busbey> that's not good
<busbey> so like they were serialized while the cluster was 1.5?
<busbey> and then post upgrade explosions?
<elserj> sounds like it
<busbey> were package names changed 1.4 -> 1.5 related to fate?
<busbey> because in theory
<busbey> I could have a 1.4 cluster
<elserj> almost want to preserve classes which were renamed as deprecated
<busbey> that I upgrade to 1.5 and then 1.6
<busbey> and I could, in theory not allow enough time for FATE to clear out in the mean
<busbey> well, or provide some kind of transition jar
<busbey> that includes classes to allow for burn off
<busbey> that you could later remove
<busbey> this sounds like a blocker
<busbey> barring some kind of documentation we could do
<busbey> for safely shutting down a cluster in prep for an upgrade
<busbey> the monitor doesn't show any indicators for waiting FATE operations, does it?
<kturner> maybe 1.6 could refuse to upgrade if the FATE queue is not empty
<busbey> filed CCUMULO-2517
<busbey> 1) was this also a problem doing 1.4 -> 1.5?
<busbey> and we just haven't had anyone hit it yet?
<elserj> do you have an idea of how many renames this introduces, keith?
<busbey> 2) that sounds like a good idea
<busbey> as a first check, then just say "please start up the master under PREV_VERSION" and wait for FATE to clear
<kturner> we could do the same thing for 1.5
<busbey> with a ref to upgrade notes that explain how to check if FATE is clear?
<busbey> that will require we finish ACCUMULO-2469, I presume?
<busbey> (that's the ticket for documenting how to access zookeeper)
<busbey> two additional tickets or one?
<elserj> there's a class that will print fate ops
<busbey> 1) upgrade instructions should include how to check if there are fate operations pending
<busbey> 2) upgrade code should refuse to upgrade if there are fae operations pending
<busbey> nice! we could use that and leave 2469 for later, then?
<elserj> ctubbsii_bot you need to trim punctuation
* murraju (~Adium@c-98-230-174-20.hsd1.ga.comcast.net) has joined #accumulo
<busbey> do those two sound like they cover the FATE bug?
<busbey> I presume we don't know enough yet to make a call on the delete marker thing?
<busbey> and that any additional guards on the GC should be aiming for post-1.6?
<kturner> I am creating a ticket, any problem w/ me just plopping this conversation onto the ticket?
<busbey> sounds good
<elserj> oh, sure