Crunch
  1. Crunch
  2. CRUNCH-142

Context not being passed to FilterFn instances wrapped in boolean AndFn,OrFn,NotFn

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.4.0
    • Fix Version/s: 0.5.0
    • Component/s: Core
    • Labels:
      None

      Description

      The boolean filter classes AndFn, OrFn and NotFn delegate calls to FilterFn instances internally, but setContext is not called on these first.

      As a result, methods which need the context such as getCounter() fail with NullPointerException.

      1. CRUNCH-142.patch
        2 kB
        Dave Beech
      2. CRUNCH-142_3.patch
        10 kB
        Gabriel Reid
      3. CRUNCH-142_2.patch
        3 kB
        Gabriel Reid
      4. CRUNCH-142_1.patch
        3 kB
        Gabriel Reid

        Activity

        Hide
        Josh Wills added a comment -

        Closing out the issus for the 0.5.0 release.

        Show
        Josh Wills added a comment - Closing out the issus for the 0.5.0 release.
        Hide
        Gabriel Reid added a comment -

        Pushed CRUNCH-142_3.patch to master

        Show
        Gabriel Reid added a comment - Pushed CRUNCH-142 _3.patch to master
        Hide
        Dave Beech added a comment -

        +1 - thanks Gabriel

        Show
        Dave Beech added a comment - +1 - thanks Gabriel
        Hide
        Josh Wills added a comment -

        Nice-- +1.

        Show
        Josh Wills added a comment - Nice-- +1.
        Hide
        Gabriel Reid added a comment -

        Non-cumulative patch that adds the delegate call to cleanup() in decorator FilterFns (And, Or, Not). Also added unit testing for the decorator FilterFns.

        Show
        Gabriel Reid added a comment - Non-cumulative patch that adds the delegate call to cleanup() in decorator FilterFns (And, Or, Not). Also added unit testing for the decorator FilterFns.
        Hide
        Josh Wills added a comment -

        I prefer Gabriel's solution here-- best to communicate these things via the API as opposed to RuntimeExceptions, I think.

        Show
        Josh Wills added a comment - I prefer Gabriel's solution here-- best to communicate these things via the API as opposed to RuntimeExceptions, I think.
        Hide
        Dave Beech added a comment -

        I like Gabriel's suggestion of a no-parameter cleanup method just for FilterFn too.

        Show
        Dave Beech added a comment - I like Gabriel's suggestion of a no-parameter cleanup method just for FilterFn too.
        Hide
        Dave Beech added a comment - - edited

        Josh - yeah, I think that'd be OK. The behaviour might still be inconsistent depending on how you've decided to chain the filters, but for me it's still much better than not calling the child function at all.

        Show
        Dave Beech added a comment - - edited Josh - yeah, I think that'd be OK. The behaviour might still be inconsistent depending on how you've decided to chain the filters, but for me it's still much better than not calling the child function at all.
        Hide
        Gabriel Reid added a comment -

        I can imagine a situation where one of AndFn's delegate FilterFn's is making use of an external resource that was connected to in initialize, and so it should be released in cleanup (regardless of whether or not this is a good idea in a MapReduce context). With this in mind, it makes sense to delegate the cleanup call.

        On the other hand, a FilterFn that is outputting stuff in the cleanup method is definitely breaking the FilterFn contract.

        The safest thing that I can see that could be done while maintaining compatibility is to make FilterFn#cleanup(Emitter) final, with it calling a FilterFn-specific cleanup method that doesn't take any parameters. However, this seems like a lot of messing around for something that shouldn't happen anyhow, so I'm still leaning towards Dave's solution.

        Show
        Gabriel Reid added a comment - I can imagine a situation where one of AndFn's delegate FilterFn's is making use of an external resource that was connected to in initialize, and so it should be released in cleanup (regardless of whether or not this is a good idea in a MapReduce context). With this in mind, it makes sense to delegate the cleanup call. On the other hand, a FilterFn that is outputting stuff in the cleanup method is definitely breaking the FilterFn contract. The safest thing that I can see that could be done while maintaining compatibility is to make FilterFn#cleanup(Emitter) final, with it calling a FilterFn-specific cleanup method that doesn't take any parameters. However, this seems like a lot of messing around for something that shouldn't happen anyhow, so I'm still leaning towards Dave's solution.
        Hide
        Josh Wills added a comment -

        What about calling the child cleanup functions and passing it an Emitter instance that throws UnsupportedOperationExceptions when you try to emit?

        Show
        Josh Wills added a comment - What about calling the child cleanup functions and passing it an Emitter instance that throws UnsupportedOperationExceptions when you try to emit?
        Hide
        Dave Beech added a comment - - edited

        I can see the problem and you're right, there's no clear right thing for AndFn to do in its cleanup method. But on the other hand I'm not sure it's relevant. The main problem for me is if you allow users to override methods, but then silently fail to run their code under certain cases.

        I think these two lines should be functionally equivalent:
        PCollection<?> filtered1 = collection.filter(fnA).filter(fnB);
        PCollection<?> filtered2 = collection.filter(FilterFns.and(fnA, fnB));

        But, without a delegating call to cleanup in AndFn, depending on what I'd put in my cleanup overrides, I might get different results.

        How about an unconditional call to cleanup on all the child FilterFns, regardless of whether they were a part of an And, Or or Not. If the user decides to emit a ton of stuff in cleanup which violates the filter logic, then that may be bad form but surely that's their decision?

        Show
        Dave Beech added a comment - - edited I can see the problem and you're right, there's no clear right thing for AndFn to do in its cleanup method. But on the other hand I'm not sure it's relevant. The main problem for me is if you allow users to override methods, but then silently fail to run their code under certain cases. I think these two lines should be functionally equivalent: PCollection<?> filtered1 = collection.filter(fnA).filter(fnB); PCollection<?> filtered2 = collection.filter(FilterFns.and(fnA, fnB)); But, without a delegating call to cleanup in AndFn, depending on what I'd put in my cleanup overrides, I might get different results. How about an unconditional call to cleanup on all the child FilterFns, regardless of whether they were a part of an And, Or or Not. If the user decides to emit a ton of stuff in cleanup which violates the filter logic, then that may be bad form but surely that's their decision?
        Hide
        Josh Wills added a comment -

        I don't see how it would work in this context-- e.g., what is the right thing for the AndFn's cleanup() method to do that would be consistent w/the AndFn's contract (e.g., only emit a value if it is emitted by all of its children?)

        Show
        Josh Wills added a comment - I don't see how it would work in this context-- e.g., what is the right thing for the AndFn's cleanup() method to do that would be consistent w/the AndFn's contract (e.g., only emit a value if it is emitted by all of its children?)
        Hide
        Gabriel Reid added a comment -

        FWIW, I follow Dave's reasoning on the cleanup method. Is there any specific reason not to implement it (regardless of whether it makes sense to override it or not)?

        Show
        Gabriel Reid added a comment - FWIW, I follow Dave's reasoning on the cleanup method. Is there any specific reason not to implement it (regardless of whether it makes sense to override it or not)?
        Hide
        Gabriel Reid added a comment -

        Oops, sorry, I was clearly too quick on committing that. I'll re-open for now until we clear out what to do with the cleanup call.

        Show
        Gabriel Reid added a comment - Oops, sorry, I was clearly too quick on committing that. I'll re-open for now until we clear out what to do with the cleanup call.
        Hide
        Gabriel Reid added a comment -

        Committed to master

        Show
        Gabriel Reid added a comment - Committed to master
        Hide
        Dave Beech added a comment -

        Thanks for the clarification about initialize / cleanup.

        I think a call to cleanup should be added, if only to guard against the inevitable case where somebody decides to do something "clever", overrides FilterFn's cleanup method, and wonders why it isn't being called! Either that or make cleanup on FilterFn final to enforce the contract, but I don't like this as much. What do you think?

        Show
        Dave Beech added a comment - Thanks for the clarification about initialize / cleanup. I think a call to cleanup should be added, if only to guard against the inevitable case where somebody decides to do something "clever", overrides FilterFn's cleanup method, and wonders why it isn't being called! Either that or make cleanup on FilterFn final to enforce the contract, but I don't like this as much. What do you think?
        Hide
        Josh Wills added a comment -

        +1

        Show
        Josh Wills added a comment - +1
        Hide
        Gabriel Reid added a comment -

        Patch with the call to this.initialize added in the setContext method.

        Show
        Gabriel Reid added a comment - Patch with the call to this.initialize added in the setContext method.
        Hide
        Gabriel Reid added a comment -

        Nah, it was plenty clear, but I still wanted to make sure that I wasn't getting confused.

        Show
        Gabriel Reid added a comment - Nah, it was plenty clear, but I still wanted to make sure that I wasn't getting confused.
        Hide
        Josh Wills added a comment -

        Yes, that is what I meant. Sorry for not making it clear.

        Show
        Josh Wills added a comment - Yes, that is what I meant. Sorry for not making it clear.
        Hide
        Gabriel Reid added a comment -

        Josh, I take it you mean a call to this.initialize() should be added, and not a call to the delegate FilterFn's initialize, right?

        Show
        Gabriel Reid added a comment - Josh, I take it you mean a call to this.initialize() should be added, and not a call to the delegate FilterFn's initialize, right?
        Hide
        Josh Wills added a comment -

        The initialize() calls in the children will get handled by calling setContext on them, but adding a call to initialize() to the setContext overrides in AndFn, NotFn, and OrFn is a good idea (even if they probably won't get called.)

        An override to cleanup() is unnecessary here IMO, since the contract on FilterFn is like the contract on MapFn: each record should be handled independently according to a accept-or-not test condition.

        Show
        Josh Wills added a comment - The initialize() calls in the children will get handled by calling setContext on them, but adding a call to initialize() to the setContext overrides in AndFn, NotFn, and OrFn is a good idea (even if they probably won't get called.) An override to cleanup() is unnecessary here IMO, since the contract on FilterFn is like the contract on MapFn: each record should be handled independently according to a accept-or-not test condition.
        Hide
        Dave Beech added a comment -

        Thanks Gabriel. Along the same lines, should there be delegate calls to initialize and cleanup as well?

        Show
        Dave Beech added a comment - Thanks Gabriel. Along the same lines, should there be delegate calls to initialize and cleanup as well?
        Hide
        Josh Wills added a comment -

        Ack, sorry-- should have read the patch first. Please proceed.

        Show
        Josh Wills added a comment - Ack, sorry-- should have read the patch first. Please proceed.
        Hide
        Josh Wills added a comment -

        Hey Gabriel-- configure() is for client-side ops-- we would want to delegate to initialize().

        Show
        Josh Wills added a comment - Hey Gabriel-- configure() is for client-side ops-- we would want to delegate to initialize().
        Hide
        Gabriel Reid added a comment -

        Nice catch Dave, thanks for that. I've updated the patch with a delegate call to the configure method as well – does that sound good to you?

        Show
        Gabriel Reid added a comment - Nice catch Dave, thanks for that. I've updated the patch with a delegate call to the configure method as well – does that sound good to you?
        Hide
        Dave Beech added a comment -

        Patch to add delegating setContext calls

        Show
        Dave Beech added a comment - Patch to add delegating setContext calls

          People

          • Assignee:
            Gabriel Reid
            Reporter:
            Dave Beech
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development