Apache Blur
  1. Apache Blur
  2. BLUR-49

BlurAnalyzer add constructor arguments

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: experimental-dev
    • Fix Version/s: experimental-dev
    • Component/s: None
    • Labels:
      None

      Description

      In the 0.2-dev branch, the BlurAnalyzer should be enhanced to allow constructor arguments to be passed in the ClassDefinition.arguments map. Not sure what the best way to implement this, but the use case would be creating a StandardAnalyzer with a non-default set of stop words.

        Activity

        Hide
        Aaron McCurry added a comment -

        Patch was applied, analyzers can now be instantiated by writing some javascript to be executed on the server side.

        Show
        Aaron McCurry added a comment - Patch was applied, analyzers can now be instantiated by writing some javascript to be executed on the server side.
        Hide
        Aaron McCurry added a comment -

        Excellent, this is exactly what was needed. Sorry it took so long to review. Thanks!

        Aaron

        https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=commit;h=b14bde717147062d88cdacf4700b38759aa74fa5

        Show
        Aaron McCurry added a comment - Excellent, this is exactly what was needed. Sorry it took so long to review. Thanks! Aaron https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=commit;h=b14bde717147062d88cdacf4700b38759aa74fa5
        Hide
        Aaron McCurry added a comment -

        I don't know what do you think? For a first cut I think that we could go with a hard coded function name until there's a need for something different. I will take a look at this patch as soon as I can. Sorry for the slow responses lately, I've busier than normal at work.

        Aaron

        Show
        Aaron McCurry added a comment - I don't know what do you think? For a first cut I think that we could go with a hard coded function name until there's a need for something different. I will take a look at this patch as soon as I can. Sorry for the slow responses lately, I've busier than normal at work. Aaron
        Hide
        Gagan Deep Juneja added a comment -

        Thanks Aaron for writing the example. So as per example the patch I have provided you is good to go.
        Please let me know if we need any change.

        But the way we have written code to invoke javascript function there is a restriction that the javascript function name must be "initAnalyzer" is that fine? or we should need to generalize this?

        Regards,
        Gagan

        Show
        Gagan Deep Juneja added a comment - Thanks Aaron for writing the example. So as per example the patch I have provided you is good to go. Please let me know if we need any change. But the way we have written code to invoke javascript function there is a restriction that the javascript function name must be "initAnalyzer" is that fine? or we should need to generalize this? Regards, Gagan
        Hide
        Aaron McCurry added a comment -

        I created a gist example. Let me know what you think.

        https://gist.github.com/amccurry/4752207

        Show
        Aaron McCurry added a comment - I created a gist example. Let me know what you think. https://gist.github.com/amccurry/4752207
        Hide
        Aaron McCurry added a comment -

        Yes, I like examples as well. I will have time tomorrow to put an example together.

        Aaron

        Show
        Aaron McCurry added a comment - Yes, I like examples as well. I will have time tomorrow to put an example together. Aaron
        Hide
        Gagan Deep Juneja added a comment -

        I got the sense out of it but didnt understand completely. Could you please take an example so that it would be more clear.

        Regards,
        Gagan

        Show
        Gagan Deep Juneja added a comment - I got the sense out of it but didnt understand completely. Could you please take an example so that it would be more clear. Regards, Gagan
        Hide
        Aaron McCurry added a comment -

        So with point 2, at the end of the constructing a the bluranalyzer for each table (there should be only one for each table) it should be ready to go and not require any further changes. So currently we do this by passing in the analyzer definition (thrift analyzer struct) Into the constructor. There may be multiple fields with their own initFunctions each one producing (potentially) different analyzers. So my thought was that while it was constructing the normal analyzer by class name, that the initFunction would logically go with that block of code because it is an alternative way to create a user defined analyzer. Let me know if doesn't make sense, I may be confused.

        Aaron

        Show
        Aaron McCurry added a comment - So with point 2, at the end of the constructing a the bluranalyzer for each table (there should be only one for each table) it should be ready to go and not require any further changes. So currently we do this by passing in the analyzer definition (thrift analyzer struct) Into the constructor. There may be multiple fields with their own initFunctions each one producing (potentially) different analyzers. So my thought was that while it was constructing the normal analyzer by class name, that the initFunction would logically go with that block of code because it is an alternative way to create a user defined analyzer. Let me know if doesn't make sense, I may be confused. Aaron
        Hide
        Gagan Deep Juneja added a comment -

        1)Yes I agree with your comment 2 and updated code as well
        2)I didn't understand point 1, are u not able to see initFunction member of struct ClassDefinition in Blur.thrift?

        I am attaching the latest patch with only three files. I am not adding generated files to patch.

        Please review and let me know your feedback.

        Regards,
        Gagan

        Show
        Gagan Deep Juneja added a comment - 1)Yes I agree with your comment 2 and updated code as well 2)I didn't understand point 1, are u not able to see initFunction member of struct ClassDefinition in Blur.thrift? I am attaching the latest patch with only three files. I am not adding generated files to patch. Please review and let me know your feedback. Regards, Gagan
        Hide
        Aaron McCurry added a comment -

        Overall the patch looks really good. I do have a couple of issues.

        1. There seems to be a few fields that are marked as required in the Blur.thrift file but I didn't see the changes in that file, because when I regenerate the code the generated code changes slightly.

        2. I think the ClassDefinition constructor needs to be removed and the code in the constructor placed into the convert method. Perhaps create a method that creates the analyzerClassObject that if the ClassDefinition.initFunction is non-null it calls the javascript code. If the ClassDefinition.className is non-null then is just calls the normal code that is already there. If both are populated, WARN in the log that the className is ignored and call the initFunction code. That way once the creation of the BlurAnalyzer is complete it's ready for use across all fields.

        What do you think?

        Aaron

        Show
        Aaron McCurry added a comment - Overall the patch looks really good. I do have a couple of issues. 1. There seems to be a few fields that are marked as required in the Blur.thrift file but I didn't see the changes in that file, because when I regenerate the code the generated code changes slightly. 2. I think the ClassDefinition constructor needs to be removed and the code in the constructor placed into the convert method. Perhaps create a method that creates the analyzerClassObject that if the ClassDefinition.initFunction is non-null it calls the javascript code. If the ClassDefinition.className is non-null then is just calls the normal code that is already there. If both are populated, WARN in the log that the className is ignored and call the initFunction code. That way once the creation of the BlurAnalyzer is complete it's ready for use across all fields. What do you think? Aaron
        Hide
        Aaron McCurry added a comment -

        Thanks Gagan, I totally understand other commitments. I will review in the next day or so. Thanks again!

        Aaron

        Show
        Aaron McCurry added a comment - Thanks Gagan, I totally understand other commitments. I will review in the next day or so. Thanks again! Aaron
        Hide
        Gagan Deep Juneja added a comment -

        I tried to implement solution as discussed. Please review the patch and let me know your feedback.

        Sorry for the delay I was busy with some other professional commitments.

        Regards,
        Gagan

        Show
        Gagan Deep Juneja added a comment - I tried to implement solution as discussed. Please review the patch and let me know your feedback. Sorry for the delay I was busy with some other professional commitments. Regards, Gagan
        Hide
        Aaron McCurry added a comment -

        Yes I think we can move forward.

        In the Thrift definition file (now found src/distribution/src/main/scripts/interface/Blur.thrift) we should modify the ClassDefinition struct to look something like:

        struct ClassDefinition

        { 1:string className, 2:map<string,string> arguments, 3:string initFunction }

        NOTE: To regenerate the Thrift RPC code run the genAndReplace.sh script.

        The initFunction would where we would pass in the javascript function. Maybe an example initFunction would like something like:

        function initAnalyzer(arguments)

        { return new org.apache.lucene.analysis.standard.StandardAnalyzer(org.apache.lucene.util.Version.LUCENE_40); }

        This function would create a standard analyzer and the arguments in the case aren't used.

        That do you think?

        Show
        Aaron McCurry added a comment - Yes I think we can move forward. In the Thrift definition file (now found src/distribution/src/main/scripts/interface/Blur.thrift) we should modify the ClassDefinition struct to look something like: struct ClassDefinition { 1:string className, 2:map<string,string> arguments, 3:string initFunction } NOTE: To regenerate the Thrift RPC code run the genAndReplace.sh script. The initFunction would where we would pass in the javascript function. Maybe an example initFunction would like something like: function initAnalyzer(arguments) { return new org.apache.lucene.analysis.standard.StandardAnalyzer(org.apache.lucene.util.Version.LUCENE_40); } This function would create a standard analyzer and the arguments in the case aren't used. That do you think?
        Hide
        Gagan Deep Juneja added a comment -

        I think that point 1 looks like more generic. Can we go ahead with implementation? Please explain a bit on point 1, I understand the intent but can you please explain where do we pass in a javascript function.

        Show
        Gagan Deep Juneja added a comment - I think that point 1 looks like more generic. Can we go ahead with implementation? Please explain a bit on point 1, I understand the intent but can you please explain where do we pass in a javascript function.
        Hide
        Aaron McCurry added a comment -

        Yeah we need a way to initialize analyzers, and I think that Thrift is going to be tough to make a generic solution. I have a couple of ideas to date, would love to hear any others.

        1. Pass in a javascript function to initialize the analyzers, since this code is only called when the table is opened or during server restart it doesn't have to be high performance.
        2. Create an abstract initializer class that can except the arguments and instantiate the new analyzer. This would force use to create some defaults for each of the standard analyzers, but we would also have to allow users to easily provide their own classes.

        At this point I like the first option better, but I really would like to hear other people's opinions.

        Show
        Aaron McCurry added a comment - Yeah we need a way to initialize analyzers, and I think that Thrift is going to be tough to make a generic solution. I have a couple of ideas to date, would love to hear any others. 1. Pass in a javascript function to initialize the analyzers, since this code is only called when the table is opened or during server restart it doesn't have to be high performance. 2. Create an abstract initializer class that can except the arguments and instantiate the new analyzer. This would force use to create some defaults for each of the standard analyzers, but we would also have to allow users to easily provide their own classes. At this point I like the first option better, but I really would like to hear other people's opinions.
        Hide
        Gagan Deep Juneja added a comment -

        I tried some solution for this like.
        public BlurAnalyzer(org.apache.blur.thrift.generated.ClassDefinition classDefinition)

        { Set<String> stopWordsSet = classDefinition.getArguments().keySet(); _defaultAnalyzer = new StandardAnalyzer(LUCENE_VERSION,stopWordsSet); }

        Is this the only use case we need to handle or do we need to make this generic?

        Gagan

        Show
        Gagan Deep Juneja added a comment - I tried some solution for this like. public BlurAnalyzer(org.apache.blur.thrift.generated.ClassDefinition classDefinition) { Set<String> stopWordsSet = classDefinition.getArguments().keySet(); _defaultAnalyzer = new StandardAnalyzer(LUCENE_VERSION,stopWordsSet); } Is this the only use case we need to handle or do we need to make this generic? Gagan

          People

          • Assignee:
            Unassigned
            Reporter:
            Aaron McCurry
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development