Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9550

innerJoin can succeed with bad sorting

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 6.1
    • None
    • SolrCloud
    • None
    • CentOS 6.8, OpenJDK 1.8

    Description

      The innerJoin streaming function requires that both streams are ordered by the correct keys for joining. In some situations, you can make a mistake and use an incorrect sort order but get a successful (but incorrect) return.

      Example:

      • Collection "UserPosts" has columns: ID, ByUserID
      • Collection "User" has columns: ID, Username, Registered, …
      • Streaming query gatherNodes(User, gatherNodes(UserPosts, walk="42 69->ID", gather="ByUserID"), walk="node->ID", gather="ID") returns the IDs of users who made posts 42 and 69, but we want the full user details
      • Streaming query innerJoin(sort(gatherNodes(User, gatherNodes(UserPosts, walk="42 69->ID", gather="ByUserID"), walk="node->ID", gather="ID"), by="ID asc"), search(User,qt="/export",q=":",fl="ID, Username, Registered, …", sort="ID asc"), on="node=ID") (Note the sort(…, by="ID"), because we're gathering the ID field, instead of sort(…, by="node"), because the gathered nodes return a tuple with the gathered ID in the "node" field)

      (Note: This example is simplified, so while there may be a better way to perform this specific query, the concept and the underlying issue remains)

      Expected result: Solr throws a (useful) exception saying that the sort orders do not match the join (because the first stream is sorted by ID, but the join is node=ID), as it does if the sort() call wasn't included.

      Actual result: Solr believes the queries are correctly sorted and returns each node from the first set joined with one set of values chosen from the second stream (each row is joined to the same row), so the returned ID and node values do not match, despite them being used in the join equality.

      This seems like a simple mistake to make at first, as I was gathering IDs and so automatically tried to sort by ID, but should have sorted by node.

      Attachments

        Activity

          People

            Unassigned Unassigned
            SJBertram Stuart Bertram
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: