Uploaded image for project: 'Crunch'
  1. Crunch
  2. CRUNCH-192

Document and enforce the semantics around reducer-based Iterables

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.6.0
    • Component/s: None
    • Labels:
      None

      Description

      As reported on user@crunch.apache.org by Chad Urso McDaniel:

      BLUF: The Iterable parameter to CombineFn.process implies you can iterate multiple times when you cannot and this leads to surprising behavior.

      As many of you probably know, the signature of CombineFn.process is

      process(Pair<K, Iterable<V>> input, Emitter<Pair<K, V>> emitter)

      The corresponding Hadoop Reducer signature is

      reduce(K2 key, Iterator<V2> values, OutputCollector<K3,V3> output, Reporter reporter)

      I assume the Crunch use of Iterable is for convenient use in "for" loops.

      Unfortunately, the behavior of this Iterable seems to return the same Iterator object each time Iterable.iterator() is called.

      This makes sense to me based on the underlying hadoop mapreduce, but violates what I think most expect from the Iterable interface.

      I understand that it's too late to change the interface, but could we at least have an javadoc or an exception thrown if the Iterable is used more than once?

        Attachments

        1. CRUNCH-192.patch
          9 kB
          Gabriel Reid
        2. CRUNCH-192.patch.v2
          11 kB
          Gabriel Reid
        3. CRUNCH-192.patch.v3
          11 kB
          Gabriel Reid

          Issue Links

            Activity

              People

              • Assignee:
                gabriel.reid Gabriel Reid
                Reporter:
                gabriel.reid Gabriel Reid
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: