Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8925

Add gatherNodes Streaming Expression to support breadth first traversals

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Implemented
    • Affects Version/s: None
    • Fix Version/s: 6.1
    • Component/s: None
    • Labels:
      None

      Description

      The gatherNodes Streaming Expression is a flexible general purpose breadth first graph traversal. It uses the same parallel join under the covers as (SOLR-8888) but is much more generalized and can be used for a wide range of use cases.

      Sample syntax:

       gatherNodes(friends,
                   gatherNodes(friends,
                               search(articles, q=“body:(queryA)”, fl=“author”),
                               walk ="author->user”,
                               gather="friend"),
                   walk=“friend->user”,
                   gather="friend",
                   scatter=“branches, leaves”)
      

      The expression above is evaluated as follows:

      1) The inner search() expression is evaluated on the articles collection, emitting a Stream of Tuples with the author field populated.
      2) The inner gatherNodes() expression reads the Tuples form the search() stream and traverses to the friends collection by performing a distributed join between articles.author and friends.user field. It gathers the value from the friend field during the join.
      3) The inner gatherNodes() expression then emits the friend Tuples. By default the gatherNodes function emits only the leaves which in this case are the friend tuples.
      4) The outer gatherNodes() expression reads the friend Tuples and Traverses again in the "friends" collection, this time performing the join between friend Tuples emitted in step 3. This collects the friend of friends.
      5) The outer gatherNodes() expression emits the entire graph that was collected. This is controlled by the "scatter" parameter. In the example the root nodes are the authors, the branches are the author's friends and the leaves are the friend of friends.

      This traversal is fully distributed and cross collection.

      Aggregations are also supported during the traversal. This can be useful for making recommendations based on co-occurance counts: Sample syntax:

      top(
            gatherNodes(baskets,
                        search(baskets, q=“prodid:X”, fl=“basketid”, rows=“500”, sort=“random_7897987 asc”),
                        walk =“basketid->basketid”,
                        gather=“prodid”,
                        fl=“prodid, price”,
                        count(*),
                        avg(price)),
            n=4,
            sort=“count(*) desc, avg(price) asc”)
      

      In the expression above, the inner search() function searches the basket collection for 500 random basketId's that have the prodid X.

      gatherNodes then traverses the basket collection and gathers all the prodid's for the selected basketIds.
      It also aggregates the counts and average price for each productid collected. The count reflects the co-occurance count for each prodid gathered and prodid X. The outer top expression selects the top 4 prodid's emitted from gatherNodes, based the co-occurance count and avg price.

      Like all streaming expressions the gatherNodes expression can be combined with other streaming expressions. For example the following expression uses a hashJoin to intersect the network of friends rooted to authors found with different queries:

      hashInnerJoin(
                            gatherNodes(friends,
                                        gatherNodes(friends,
                                                    search(articles, q=“body:(queryA)”, fl=“author”),
                                                    walk ="author->user”,
                                                    gather="friend"),
                                        walk=“friend->user”,
                                        gather="friend",
                                        scatter=“branches, leaves”),
                             gatherNodes(friends,
                                        gatherNodes(friends,
                                                    search(articles, q=“body:(queryB)”, fl=“author”),
                                                    walk ="author->user”,
                                                    gather="friend"),
                                        walk=“friend->user”,
                                        gather="friend",
                                        scatter=“branches, leaves”),
                            on=“friend”
               )
      

        Attachments

        1. SOLR-8925.patch
          51 kB
          Joel Bernstein
        2. SOLR-8925.patch
          49 kB
          Joel Bernstein
        3. SOLR-8925.patch
          46 kB
          Joel Bernstein
        4. SOLR-8925.patch
          45 kB
          Joel Bernstein
        5. SOLR-8925.patch
          38 kB
          Joel Bernstein
        6. SOLR-8925.patch
          30 kB
          Joel Bernstein
        7. SOLR-8925.patch
          29 kB
          Joel Bernstein
        8. SOLR-8925.patch
          24 kB
          Joel Bernstein

          Issue Links

            Activity

              People

              • Assignee:
                joel.bernstein Joel Bernstein
                Reporter:
                joel.bernstein Joel Bernstein
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: