Uploaded image for project: 'Crunch'
  1. Crunch
  2. CRUNCH-655

DefaultJoinStrategy full outer join failing for spark pipeline

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Spark
    • Labels:
      None
    • Environment:
      Mac OSX, crunch 0.13.0 and 0.15.0 (with reproduction code), Ubuntu 14.04 (repro code not tried, but similar issue in production with 0.13.0)

      Description

      When the left and right table in the the join have entries with the same key, they do not alway end up together. Cannot reproduce when running the join with a single reducer, and happens more commonly if there are many reducers and very few copies of each key to the left and right.

      My guess is that it sometimes happens that the left value for key k ends up on a different reducer from the right value with key k.

      With my production issue, it went away if I either used a single reducer or used cogroup instead.

      I've attached a class to reproduce the issue.

      1. OuterJoinTest.java
        3 kB
        Mikael Goldmann

        Activity

        There are no comments yet on this issue.

          People

          • Assignee:
            Unassigned
            Reporter:
            migoldmann Mikael Goldmann
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development