Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8188

Add hash style joins to the Streaming API and Streaming Expressions

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Implemented
    • Affects Version/s: None
    • Fix Version/s: 6.0
    • Component/s: SolrJ
    • Labels:
      None

      Description

      Add HashJoinStream and OuterHashJoinStream to the Streaming API to allow for optimized joining between sub-streams.

      HashJoinStream is similar to an InnerJoinStream except that it does not insist on any particular order and will read all values from the stream being hashed (hashStream) when open() is called. During read() it will return the next tuple from the stream not being hashed (fullStream) which has at least one matching record in hashStream. It will return a tuple which is the merge of both tuples. If the tuple from the fullStream matches with more than one tuple from the hashStream then calling read() will return the merge with the next matching tuple. The order of the resulting stream is the order of the fullStream.

      OuterHashJoinStream is similar to a HashJoinStream and LeftOuterJoinStream in that a tuple from fullStream will be returned even if it doesn't have a matching record in hashStream. All other pieces are identical.

      In expression form

      hashJoin(
        search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
        hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
        on="fieldA, fieldB"
      )
      
      outerHashJoin(
        search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
        hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
        on="fieldA, fieldB"
      )
      

      As you can see the hashStream is named parameter which makes it very clear which stream should be hashed.

      1. SOLR-8188.patch
        25 kB
        Dennis Gove
      2. SOLR-8188.patch
        25 kB
        Dennis Gove
      3. SOLR-8188.patch
        25 kB
        Dennis Gove

        Issue Links

          Activity

          Hide
          dpgove Dennis Gove added a comment -

          All tests pass.

          Show
          dpgove Dennis Gove added a comment - All tests pass.
          Hide
          dpgove Dennis Gove added a comment - - edited

          Added a field separator to the hash calculation. This is to prevent a situation where two tuples have the same hashed value where they shouldn't.

          t1.fieldA = "foo"
          t1.fieldB = "bar"

          t2.fieldA = "foob"
          t2.fieldB = "ar"

          With this change the hash will be different for t1 and t2.

          Show
          dpgove Dennis Gove added a comment - - edited Added a field separator to the hash calculation. This is to prevent a situation where two tuples have the same hashed value where they shouldn't. t1.fieldA = "foo" t1.fieldB = "bar" t2.fieldA = "foob" t2.fieldB = "ar" With this change the hash will be different for t1 and t2.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1713950 from dpgove@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1713950 ]

          SOLR-8188: Adds Hash and OuterHash Joins to the Streaming API and Streaming Expressions

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1713950 from dpgove@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1713950 ] SOLR-8188 : Adds Hash and OuterHash Joins to the Streaming API and Streaming Expressions
          Hide
          dpgove Dennis Gove added a comment -

          Forgot to attach a slightly modified patch file (rebased off trunk).

          Show
          dpgove Dennis Gove added a comment - Forgot to attach a slightly modified patch file (rebased off trunk).
          Hide
          dpgove Dennis Gove added a comment -

          This is the patch that was applied to trunk.

          Show
          dpgove Dennis Gove added a comment - This is the patch that was applied to trunk.
          Hide
          dpgove Dennis Gove added a comment -

          Still closed

          Show
          dpgove Dennis Gove added a comment - Still closed

            People

            • Assignee:
              dpgove Dennis Gove
              Reporter:
              dpgove Dennis Gove
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development