Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9077

Streaming expressions should support collection alias

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 5.5.1
    • Fix Version/s: 6.4
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      Streaming expressions should support collection aliases.

      When I tried to access collection alias I get a null pointer exception.

      issue seems to be related to following code: clusterState.getActiveSlices returns null

      Collection<Slice> slices = clusterState.getActiveSlices(this.collection);
      

      fix seems to fairly simple , clusterState.getActiveSlices can be made aware of collection alias. I am not sure what will happen when we have large alias which has hundred of slices.

      1. SOLR-9077.patch
        146 kB
        Kevin Risden
      2. SOLR-9077.patch
        142 kB
        Kevin Risden
      3. SOLR-9077.patch
        142 kB
        Kevin Risden
      4. SOLR-9077.patch
        141 kB
        Kevin Risden
      5. SOLR-9077.patch
        14 kB
        Kevin Risden

        Activity

        Hide
        risdenk Kevin Risden added a comment -

        Dennis Gove - Looks like there is some related code to getting info about aliases here:

        https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L1299

        In CloudSolrStream, all that looks like it needs to be done is get all the collections (from aliases too) and then add all the replicas from those collections. I can put a patch together for that.

        Are there other places where aliases would be used that I need to check?

        Show
        risdenk Kevin Risden added a comment - Dennis Gove - Looks like there is some related code to getting info about aliases here: https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L1299 In CloudSolrStream, all that looks like it needs to be done is get all the collections (from aliases too) and then add all the replicas from those collections. I can put a patch together for that. Are there other places where aliases would be used that I need to check?
        Hide
        dpgove Dennis Gove added a comment -

        It might be necessary to do that in each source stream, though a centralized location might be better. The StreamFactory could be a place to store the aliases after looking them up but I'm not 100% sure that's the best place.

        Show
        dpgove Dennis Gove added a comment - It might be necessary to do that in each source stream, though a centralized location might be better. The StreamFactory could be a place to store the aliases after looking them up but I'm not 100% sure that's the best place.
        Hide
        risdenk Kevin Risden added a comment -

        First pass at trying to make aliases work. There are no tests yet but existing tests pass. This is an approach of refactoring getting slices.

        Show
        risdenk Kevin Risden added a comment - First pass at trying to make aliases work. There are no tests yet but existing tests pass. This is an approach of refactoring getting slices.
        Hide
        risdenk Kevin Risden added a comment -

        Attaching patch with randomized alias vs collection for JdbcTest, StreamingTest, StreamExpressionTest, and JDBCStreamTest. Tests seem to be passing for me locally.

        Joel Bernstein or Dennis Gove - Can you take a look and see if this is sane?

        Show
        risdenk Kevin Risden added a comment - Attaching patch with randomized alias vs collection for JdbcTest, StreamingTest, StreamExpressionTest, and JDBCStreamTest. Tests seem to be passing for me locally. Joel Bernstein or Dennis Gove - Can you take a look and see if this is sane?
        Hide
        risdenk Kevin Risden added a comment -

        Fixed issue w/ StatementImpl for JDBC when using alias.

        Show
        risdenk Kevin Risden added a comment - Fixed issue w/ StatementImpl for JDBC when using alias.
        Hide
        risdenk Kevin Risden added a comment -

        Remove some unused imports

        Show
        risdenk Kevin Risden added a comment - Remove some unused imports
        Hide
        joel.bernstein Joel Bernstein added a comment - - edited

        Looks good from a quick review, but it's a pretty big patch. It will take a little time to take the whole thing in. If you're feeling comfortable with it feel free to commit, I'm sure this code will get exercised plenty before the 6.4 release.

        Show
        joel.bernstein Joel Bernstein added a comment - - edited Looks good from a quick review, but it's a pretty big patch. It will take a little time to take the whole thing in. If you're feeling comfortable with it feel free to commit, I'm sure this code will get exercised plenty before the 6.4 release.
        Hide
        risdenk Kevin Risden added a comment -

        Yea most of the changes are actually test variable changes (COLLECTION -> COLLECTIONORALIAS). Wasn't sure how else to make that change. The important section is confined to CloudSolrStream.getSlices(). The changes I'm concerned with are the FeaturesSectionStream, ParallelStream, TextLogitStream, and TopicStream. These all had getActiveSlices. Not sure the slices HAD to be for a specific collection. Now they should work with all slices from all collections in an alias.

        Any easier way to view is here: https://github.com/apache/lucene-solr/commit/1e9a285d53ee1a57a0229c4eaf6666deb8f13c35

        Show
        risdenk Kevin Risden added a comment - Yea most of the changes are actually test variable changes (COLLECTION -> COLLECTIONORALIAS). Wasn't sure how else to make that change. The important section is confined to CloudSolrStream.getSlices(). The changes I'm concerned with are the FeaturesSectionStream, ParallelStream, TextLogitStream, and TopicStream. These all had getActiveSlices. Not sure the slices HAD to be for a specific collection. Now they should work with all slices from all collections in an alias. Any easier way to view is here: https://github.com/apache/lucene-solr/commit/1e9a285d53ee1a57a0229c4eaf6666deb8f13c35
        Hide
        joel.bernstein Joel Bernstein added a comment -

        ParallelStream should be fine as it's just a bag of worker nodes. We may not want to support aliases for the FeatureSelectionStream, TextLogitStream and TopicStream. In particular the TopicStream relies on checkpoints which are specific to a physical collection. The FeatureSelectionStream and TextLogitStream probably don't need to support alias and may just complicate things.

        Show
        joel.bernstein Joel Bernstein added a comment - ParallelStream should be fine as it's just a bag of worker nodes. We may not want to support aliases for the FeatureSelectionStream, TextLogitStream and TopicStream. In particular the TopicStream relies on checkpoints which are specific to a physical collection. The FeatureSelectionStream and TextLogitStream probably don't need to support alias and may just complicate things.
        Hide
        risdenk Kevin Risden added a comment -

        Currently the matching logic is as follows:

        • collection name case sensitive
        • collection name case insensitive
        • alias name case sensitive

        This means that current behavior for TopicStream, FeatureSelectionStream, and TextLogitStream would be unaffected and would just add the alias support. Are there any known downsides to using aliases other than maybe complicating things? It looks to me that slices are what is used for the checkpoints so that would just work with aliases getting more slices.

        I'll double check the tests for TopicStream, FeatureSelectionStream, and TextLogitStream to make sure those are being tested with aliases.

        One item that I might adjust with the tests is to make sure aliases are pointing to multiple collections. Currently the alias only points to a single collection.

        Show
        risdenk Kevin Risden added a comment - Currently the matching logic is as follows: collection name case sensitive collection name case insensitive alias name case sensitive This means that current behavior for TopicStream, FeatureSelectionStream, and TextLogitStream would be unaffected and would just add the alias support. Are there any known downsides to using aliases other than maybe complicating things? It looks to me that slices are what is used for the checkpoints so that would just work with aliases getting more slices. I'll double check the tests for TopicStream, FeatureSelectionStream, and TextLogitStream to make sure those are being tested with aliases. One item that I might adjust with the tests is to make sure aliases are pointing to multiple collections. Currently the alias only points to a single collection.
        Hide
        suds.s Suds added a comment -

        One issue I faced is I was working with large cluster ~40 nodes (40 shards) in that case it would have many slices per alias not sure if we need to throw error as it may cause issues while creating fixed size threadpool if no of slices > some number?

        Show
        suds.s Suds added a comment - One issue I faced is I was working with large cluster ~40 nodes (40 shards) in that case it would have many slices per alias not sure if we need to throw error as it may cause issues while creating fixed size threadpool if no of slices > some number?
        Hide
        joel.bernstein Joel Bernstein added a comment -

        The problem with aliases with Topics is that encourages using the alias as a pointer to the collection. When someone reassigns the alias to a different collection, it will break the topic even if the data is the same. I think it makes sense to treat topics as an outlier and not support aliases for it.

        Show
        joel.bernstein Joel Bernstein added a comment - The problem with aliases with Topics is that encourages using the alias as a pointer to the collection. When someone reassigns the alias to a different collection, it will break the topic even if the data is the same. I think it makes sense to treat topics as an outlier and not support aliases for it.
        Hide
        risdenk Kevin Risden added a comment -

        Patch that makes TopicStream, FeatureSelectionStream, and TextLogitStream not support aliases. It would be easy to add later by changing the getSlices checkAlias parameter.

        Show
        risdenk Kevin Risden added a comment - Patch that makes TopicStream, FeatureSelectionStream, and TextLogitStream not support aliases. It would be easy to add later by changing the getSlices checkAlias parameter.
        Hide
        risdenk Kevin Risden added a comment -

        Suds - Not sure if this would cause an issue. I would leave that as a follow up if that is an issue. Alias support is missing for Parallel SQL/JDBC and would love to get this in for 6.4.

        Show
        risdenk Kevin Risden added a comment - Suds - Not sure if this would cause an issue. I would leave that as a follow up if that is an issue. Alias support is missing for Parallel SQL/JDBC and would love to get this in for 6.4.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit ace423e958182aa8ad6329f5cc1dc3ca6cd877c7 in lucene-solr's branch refs/heads/master from Kevin Risden
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ace423e ]

        SOLR-9077: Streaming expressions should support collection alias

        Show
        jira-bot ASF subversion and git services added a comment - Commit ace423e958182aa8ad6329f5cc1dc3ca6cd877c7 in lucene-solr's branch refs/heads/master from Kevin Risden [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ace423e ] SOLR-9077 : Streaming expressions should support collection alias
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit e3db9f3b8a28e1de0b6fcd5cb358a948f7a23423 in lucene-solr's branch refs/heads/branch_6x from Kevin Risden
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e3db9f3 ]

        SOLR-9077: Streaming expressions should support collection alias

        Show
        jira-bot ASF subversion and git services added a comment - Commit e3db9f3b8a28e1de0b6fcd5cb358a948f7a23423 in lucene-solr's branch refs/heads/branch_6x from Kevin Risden [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e3db9f3 ] SOLR-9077 : Streaming expressions should support collection alias

          People

          • Assignee:
            risdenk Kevin Risden
            Reporter:
            suds.s Suds
          • Votes:
            3 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development