Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-211

Add one-to-many join functionality

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.7.0
    • None
    • None

    Description

      A common pattern is a join between two tables where the left-side table contains a single value per key, and the right-side table contains multiple values per key. An example of such a join would be a join between users and web click entries:

      PTable<Long,User> usersById = ...;
      PTable<Long,WebClick> webClicksByUserId = ...;

      In this case, there can be some situations where it is desirable to bring the User together with the iterable of all WebClicks. The current join functionality will replicate the User for each WebClick that it's related to, but each WebClick then needs to be dealt with completely separately.

      Currently, the only way of getting an iterable of WebClicks together with a single User in a single method call is by materializing all WebClicks per user in memory using something like PTable#collectValues, and this approach doesn't work when there are a large number of WebClicks.

      The intention of this ticket is to add functionality whereby the User and Iterable of WebClicks are available in a single method call, without the Iterable of WebClicks being materialized in memory (i.e. a feasible approach for millions or more WebClicks).

      Attachments

        1. CRUNCH-211.patch
          11 kB
          Gabriel Reid
        2. CRUNCH-211.patch
          11 kB
          Gabriel Reid

        Activity

          People

            Unassigned Unassigned
            gabriel.reid Gabriel Reid
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: