Solr
  1. Solr
  2. SOLR-993

VariableResolverImpl addNamespace overwrites entire namespace instead of adding

    Details

      Description

      The addNamespace method in VariableResolverImpl does not so much add the namespace as overwrite it.

      1. SOLR-993c.patch
        11 kB
        Jared Flatow
      2. SOLR-993c.patch
        11 kB
        Jared Flatow
      3. SOLR-993c.patch
        11 kB
        Shalin Shekhar Mangar
      4. SOLR-993b.patch
        11 kB
        Jared Flatow
      5. SOLR-993.patch
        6 kB
        Jared Flatow
      6. SOLR-993.patch
        3 kB
        Noble Paul

        Activity

        Hide
        Uwe Schindler added a comment -

        Move issue to Solr 4.9.

        Show
        Uwe Schindler added a comment - Move issue to Solr 4.9.
        Hide
        Steve Rowe added a comment -

        Bulk move 4.4 issues to 4.5 and 5.0

        Show
        Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
        Hide
        Robert Muir added a comment -

        moving all 4.0 issues not touched in a month to 4.1

        Show
        Robert Muir added a comment - moving all 4.0 issues not touched in a month to 4.1
        Hide
        Robert Muir added a comment -

        rmuir20120906-bulk-40-change

        Show
        Robert Muir added a comment - rmuir20120906-bulk-40-change
        Hide
        Hoss Man added a comment -

        bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

        Show
        Hoss Man added a comment - bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment
        Hide
        Robert Muir added a comment -

        3.4 -> 3.5

        Show
        Robert Muir added a comment - 3.4 -> 3.5
        Hide
        Robert Muir added a comment -

        Bulk move 3.2 -> 3.3

        Show
        Robert Muir added a comment - Bulk move 3.2 -> 3.3
        Hide
        Hoss Man added a comment -

        Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

        http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

        Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

        A unique token for finding these 240 issues in the future: hossversioncleanup20100527

        Show
        Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
        Hide
        Noble Paul added a comment -

        there is no compelling reason to change the behavior now. we can take a relook at this in 1.5

        Show
        Noble Paul added a comment - there is no compelling reason to change the behavior now. we can take a relook at this in 1.5
        Hide
        Noble Paul added a comment -

        Hi, Jared,
        addToNamespace() is just one part of the issue. Let us open another issue for the semantics change.

        but it is still not clear to me how the API is meant to be used

        This can be documented well . It should not be a big problem.

        Show
        Noble Paul added a comment - Hi, Jared, addToNamespace() is just one part of the issue. Let us open another issue for the semantics change. but it is still not clear to me how the API is meant to be used This can be documented well . It should not be a big problem.
        Hide
        Jared Flatow added a comment -

        Shalin: yes I saw it, I still feel the underlying issues are just as important if not moreso. addToNamespace is probably an improvement in the API, but it is still not clear to me how the API is meant to be used, which I can only imagine is true for others as well. I think making the semantics for using the VariableResolver well-defined is the most important part of the patch, as well as this discussion.

        Show
        Jared Flatow added a comment - Shalin: yes I saw it, I still feel the underlying issues are just as important if not moreso. addToNamespace is probably an improvement in the API, but it is still not clear to me how the API is meant to be used, which I can only imagine is true for others as well. I think making the semantics for using the VariableResolver well-defined is the most important part of the patch, as well as this discussion.
        Hide
        Shalin Shekhar Mangar added a comment -

        Jared, did you get a chance to see the latest patch?

        Show
        Shalin Shekhar Mangar added a comment - Jared, did you get a chance to see the latest patch?
        Hide
        Noble Paul added a comment -

        new method added addToNamespace w/o making any other changes

        Show
        Noble Paul added a comment - new method added addToNamespace w/o making any other changes
        Hide
        Noble Paul added a comment -

        thanks shalin . it got applied

        Show
        Noble Paul added a comment - thanks shalin . it got applied
        Hide
        Shalin Shekhar Mangar added a comment -

        Same as Jared's patch but generated with svn diff. Noble, I think tortoise svn is having problems applying patch generated by git. Try this one instead.

        Show
        Shalin Shekhar Mangar added a comment - Same as Jared's patch but generated with svn diff. Noble, I think tortoise svn is having problems applying patch generated by git. Try this one instead.
        Hide
        Noble Paul added a comment -

        The addToNamespace method relies on the new getOrInsertNamespace, and changing the others to use this method also ensures that they will be consistent with one another. Perhaps the logging issue should be in a separate patch

        we can easily implement addToNamespace w/o the other changes you have made . I wish that the behavior change be implemented in a separate issue. It needs better review. I am still not convinced about the correctness of the implementation

        BTW, I am still unable to apply the patch using tortoise svn .

        Show
        Noble Paul added a comment - The addToNamespace method relies on the new getOrInsertNamespace, and changing the others to use this method also ensures that they will be consistent with one another. Perhaps the logging issue should be in a separate patch we can easily implement addToNamespace w/o the other changes you have made . I wish that the behavior change be implemented in a separate issue. It needs better review. I am still not convinced about the correctness of the implementation BTW, I am still unable to apply the patch using tortoise svn .
        Hide
        Jared Flatow added a comment -

        The patch would not apply if it has the patch like a/contrib/dataimporthandler/ .it should not have that "a" in the beginning.

        Use patch -p1 (or whatever tool you are using, make the patchlevel 1). In case somehow your tool is unable to do that, I am attaching a patch without the prefixes.

        it still does not allow other characters in the namespace. We may need to discuss more about this

        Sure, which other characters should be allowed?

        I dont see the need for so many changes if we are dealing with things separately. especially the resolve method

        ISTM the changes are all interrelated. The addToNamespace method relies on the new getOrInsertNamespace, and changing the others to use this method also ensures that they will be consistent with one another. Perhaps the logging issue should be in a separate patch...?

        Show
        Jared Flatow added a comment - The patch would not apply if it has the patch like a/contrib/dataimporthandler/ .it should not have that "a" in the beginning. Use patch -p1 (or whatever tool you are using, make the patchlevel 1). In case somehow your tool is unable to do that, I am attaching a patch without the prefixes. it still does not allow other characters in the namespace. We may need to discuss more about this Sure, which other characters should be allowed? I dont see the need for so many changes if we are dealing with things separately. especially the resolve method ISTM the changes are all interrelated. The addToNamespace method relies on the new getOrInsertNamespace, and changing the others to use this method also ensures that they will be consistent with one another. Perhaps the logging issue should be in a separate patch...?
        Hide
        Noble Paul added a comment -
        1. The patch would not apply if it has the patch like a/contrib/dataimporthandler/ .it should not have that "a" in the beginning.
        2. it still does not allow other characters in the namespace. We may need to discuss more about this
        3. did we open another issue for the behavior change in VariableResolver? I dont see the need for so many changes if we are dealing with things separately. especially the resolve method
        Show
        Noble Paul added a comment - The patch would not apply if it has the patch like a/contrib/dataimporthandler/ .it should not have that "a" in the beginning. it still does not allow other characters in the namespace. We may need to discuss more about this did we open another issue for the behavior change in VariableResolver? I dont see the need for so many changes if we are dealing with things separately. especially the resolve method
        Hide
        Jared Flatow added a comment -

        I updated the patch to work off the trunk. The full set of changes in this patch are as follows:

        1. get rid of no-op call to removeNamespace(null) in DocBuilder
        2. warn about all unresolved variables from TemplateString
        3. add a getOrInsertNamespace method to VariableResolverImpl, which returns the named namespace, creating it if it doesn't already exist. addNamespace, addToNamespace and removeNamespace are now implemented in terms of this method
        4. add a splitName method to VariableResolverImpl which breaks a name into its hierarchical components (discussed in detail here). getOrInsertNamespace uses this method, as does resolve
        5. modify TemplateTransformer to use the new VariableResolverImpl methods and not to warn about unresolved variables (since this was moved to TemplateString)

        Show
        Jared Flatow added a comment - I updated the patch to work off the trunk. The full set of changes in this patch are as follows: 1. get rid of no-op call to removeNamespace(null) in DocBuilder 2. warn about all unresolved variables from TemplateString 3. add a getOrInsertNamespace method to VariableResolverImpl, which returns the named namespace, creating it if it doesn't already exist. addNamespace, addToNamespace and removeNamespace are now implemented in terms of this method 4. add a splitName method to VariableResolverImpl which breaks a name into its hierarchical components (discussed in detail here). getOrInsertNamespace uses this method, as does resolve 5. modify TemplateTransformer to use the new VariableResolverImpl methods and not to warn about unresolved variables (since this was moved to TemplateString)
        Hide
        Jared Flatow added a comment -

        Jared, can you please update the patch for trunk? I couldn't apply the patch. I think the path is not correct or there is a conflict.

        Sure, I will update again and check, but I believe the change to StringBuilder from StringBuffer may cause the conflict, since that occurs in the mergeAll method, which no longer exists in the patched version.

        We are splitting by '.' and going into each map we find. Will this reach upto a '.' character in a float or string value given as a parameter to a function?

        The RE matches all word characters from the beginning of the previous match to the next '.' or the end of the input. Whatever is left over at the end is considered the last token. Therefore the name of the namespace itself must be all word characters, while the 'variable' can be anything. This is the reason the patch is unambiguous as opposed to the current implementation. As it is now, it will '.' split the name, trying to lookup each one, if it is not found it will try to merge tokens back together and look it up again. This is highly dependent on the context the name is being resolved in, whereas the patch always results in the same split boundaries for a given name. Additionally, it has the nice property that the parent namespaces can be determined without checking to see if they exist.

        Show
        Jared Flatow added a comment - Jared, can you please update the patch for trunk? I couldn't apply the patch. I think the path is not correct or there is a conflict. Sure, I will update again and check, but I believe the change to StringBuilder from StringBuffer may cause the conflict, since that occurs in the mergeAll method, which no longer exists in the patched version. We are splitting by '.' and going into each map we find. Will this reach upto a '.' character in a float or string value given as a parameter to a function? The RE matches all word characters from the beginning of the previous match to the next '.' or the end of the input. Whatever is left over at the end is considered the last token. Therefore the name of the namespace itself must be all word characters, while the 'variable' can be anything. This is the reason the patch is unambiguous as opposed to the current implementation. As it is now, it will '.' split the name, trying to lookup each one, if it is not found it will try to merge tokens back together and look it up again. This is highly dependent on the context the name is being resolved in, whereas the patch always results in the same split boundaries for a given name. Additionally, it has the nice property that the parent namespaces can be determined without checking to see if they exist.
        Hide
        Shalin Shekhar Mangar added a comment -

        Jared, can you please update the patch for trunk? I couldn't apply the patch. I think the path is not correct or there is a conflict.

        It should, and at least for the formatDate format string, it does. Which function are you using to test?

        for (String part : splitName(variable)) {
                value = namespace.get(part);
                namespace = (value instanceof Map) ? (Map) value : Collections.EMPTY_MAP;
        }
        

        I did not apply the patch but I was looking at the above code. We are splitting by '.' and going into each map we find. Will this reach upto a '.' character in a float or string value given as a parameter to a function?

        I think we should take a step back here. What is it that you are trying to achieve? There are multiple things that you are trying to address:

        1. addToNamespace method in variable resolver
        2. Change behavior of variable resolver (why? not very clear to me). In this process we are changing working code and disallowing some characters from appearing in variable names (
          w means no hyphens etc.) and no '.' in dynamic maps. These are not big changes but I want to understand why we need these changes?

        It is better to focus on one issue at a time. Let us use this issue for the addToNamespace method which we all agree to be useful. Let us create another issue for the discussion on the behavior of variable resolver.

        Show
        Shalin Shekhar Mangar added a comment - Jared, can you please update the patch for trunk? I couldn't apply the patch. I think the path is not correct or there is a conflict. It should, and at least for the formatDate format string, it does. Which function are you using to test? for ( String part : splitName(variable)) { value = namespace.get(part); namespace = (value instanceof Map) ? (Map) value : Collections.EMPTY_MAP; } I did not apply the patch but I was looking at the above code. We are splitting by '.' and going into each map we find. Will this reach upto a '.' character in a float or string value given as a parameter to a function? I think we should take a step back here. What is it that you are trying to achieve? There are multiple things that you are trying to address: addToNamespace method in variable resolver Change behavior of variable resolver (why? not very clear to me). In this process we are changing working code and disallowing some characters from appearing in variable names ( w means no hyphens etc.) and no '.' in dynamic maps. These are not big changes but I want to understand why we need these changes? It is better to focus on one issue at a time. Let us use this issue for the addToNamespace method which we all agree to be useful. Let us create another issue for the discussion on the behavior of variable resolver.
        Hide
        Jared Flatow added a comment -

        the way the resolve method is implemented, it won't work with passing float values or a string containing '.' to a function.

        Why not? It should, and at least for the formatDate format string, it does. Which function are you using to test?

        Show
        Jared Flatow added a comment - the way the resolve method is implemented, it won't work with passing float values or a string containing '.' to a function. Why not? It should, and at least for the formatDate format string, it does. Which function are you using to test?
        Hide
        Shalin Shekhar Mangar added a comment -

        Are there any cases of dynamic maps right now that don't look like function calls?

        No but do remember that VariableResolver is agnostic of functions. They are resolved just like any other variable.

        Is it unreasonable to say you can't use '.' characters in naming those functions? If you really want to call "a.b.c.d.e(...)", then have your dynamic namespace be "a.b.c.d" with a function "e(...)".

        I guess not.

        WRT iterating over the maps: to be clear, I would prefer the implementation be changed to how the patch is now

        We need to change the patch slightly. In particular, the way the resolve method is implemented, it won't work with passing float values or a string containing '.' to a function.

        Please ignore the comment about false alarms. I'm fine with the idea of logging failures in resolving variables.

        Show
        Shalin Shekhar Mangar added a comment - Are there any cases of dynamic maps right now that don't look like function calls? No but do remember that VariableResolver is agnostic of functions. They are resolved just like any other variable. Is it unreasonable to say you can't use '.' characters in naming those functions? If you really want to call "a.b.c.d.e(...)", then have your dynamic namespace be "a.b.c.d" with a function "e(...)". I guess not. WRT iterating over the maps: to be clear, I would prefer the implementation be changed to how the patch is now We need to change the patch slightly. In particular, the way the resolve method is implemented, it won't work with passing float values or a string containing '.' to a function. Please ignore the comment about false alarms. I'm fine with the idea of logging failures in resolving variables.
        Hide
        Jared Flatow added a comment -

        In that case if I add a Map to a namespace, you won't be able to iterate over the keys and find out ones which have a '.' character. How do we handle those cases?

        Are there any cases of dynamic maps right now that don't look like function calls? In my mind, any dynamic variables should look like "a.b.c.d(...)", where "a.b.c" is the namespace in which you can find the variable/function "d(...)". I think this is consistent with the way these are added to namespaces right now? The only problem is if you want to have the function called "d.e(...)" in the namespace "a.b.c", which I don't think should be supported. Is it unreasonable to say you can't use '.' characters in naming those functions? If you really want to call "a.b.c.d.e(...)", then have your dynamic namespace be "a.b.c.d" with a function "e(...)".

        WRT iterating over the maps: to be clear, I would prefer the implementation be changed to how the patch is now: if you add "b.c" to "a", then thats how you have to retrieve it; therefore you should add "c" to "a.b" if you want "a.b.c" to be resolvable. Perhaps if you have a specific map which you know will have '.' separated names (like request), there should be a flag in the add* methods for whether to check or not.

        Nonetheless I understand that I might be the only one that cares about these things and that these changes are not backwards compatible. Still, the API explicitly mentions that it is unstable so perhaps changing it slightly is not a big deal. I think this will make some steps towards solidifying the API. The other thing the patch does towards this goal is to make consistent use of what the null and empty string "" namespace refer to (the patch follows the convention that null is the top-level namespace, and empty string is treated the same as any other string).

        I think this is a good change to make. However, we need to be careful with this. With the current way of resolving variables, one may get a lot of false alarms.

        How so?

        Show
        Jared Flatow added a comment - In that case if I add a Map to a namespace, you won't be able to iterate over the keys and find out ones which have a '.' character. How do we handle those cases? Are there any cases of dynamic maps right now that don't look like function calls? In my mind, any dynamic variables should look like "a.b.c.d(...)", where "a.b.c" is the namespace in which you can find the variable/function "d(...)". I think this is consistent with the way these are added to namespaces right now? The only problem is if you want to have the function called "d.e(...)" in the namespace "a.b.c", which I don't think should be supported. Is it unreasonable to say you can't use '.' characters in naming those functions? If you really want to call "a.b.c.d.e(...)", then have your dynamic namespace be "a.b.c.d" with a function "e(...)". WRT iterating over the maps: to be clear, I would prefer the implementation be changed to how the patch is now: if you add "b.c" to "a", then thats how you have to retrieve it; therefore you should add "c" to "a.b" if you want "a.b.c" to be resolvable. Perhaps if you have a specific map which you know will have '.' separated names (like request), there should be a flag in the add* methods for whether to check or not. Nonetheless I understand that I might be the only one that cares about these things and that these changes are not backwards compatible. Still, the API explicitly mentions that it is unstable so perhaps changing it slightly is not a big deal. I think this will make some steps towards solidifying the API. The other thing the patch does towards this goal is to make consistent use of what the null and empty string "" namespace refer to (the patch follows the convention that null is the top-level namespace, and empty string is treated the same as any other string). I think this is a good change to make. However, we need to be careful with this. With the current way of resolving variables, one may get a lot of false alarms. How so?
        Hide
        Shalin Shekhar Mangar added a comment - - edited

        Jared, I can see your point. However I am not sure how your suggestion will work with dynamic maps where the values are computed based on the key provided to Map#get method. In that case if I add a Map to a namespace, you won't be able to iterate over the keys and find out ones which have a '.' character. How do we handle those cases?

        On the whole, I understand the confusion and I'm fine either way but I'm not sure what value we are adding by making this change.

        Whatever you decide I hope you will also keep the log warning when a variable fails to be resolved. That will at least give a clue if template variable resolution is not working as you expect.

        I think this is a good change to make. However, we need to be careful with this. With the current way of resolving variables, one may get a lot of false alarms.

        Show
        Shalin Shekhar Mangar added a comment - - edited Jared, I can see your point. However I am not sure how your suggestion will work with dynamic maps where the values are computed based on the key provided to Map#get method. In that case if I add a Map to a namespace, you won't be able to iterate over the keys and find out ones which have a '.' character. How do we handle those cases? On the whole, I understand the confusion and I'm fine either way but I'm not sure what value we are adding by making this change. Whatever you decide I hope you will also keep the log warning when a variable fails to be resolved. That will at least give a clue if template variable resolution is not working as you expect. I think this is a good change to make. However, we need to be careful with this. With the current way of resolving variables, one may get a lot of false alarms.
        Hide
        Jared Flatow added a comment -

        If I add it after splitting the names , how will I remove the namespace "dataimporter.request" completely without removing other variables which were added with addTonameSpace separately?

        I'm not clear why this is different than the way it is now? If "dataimporter.request.a.b" actually refers to a mapping with key "a.b" in the "dataimporter.request" namespace, or a mapping with key "b" in the "dataimporter.request.a" namespace is irrelevant in the case where you want to completely remove the namespace "datimporter.request". The difference is that as it is implemented now, "dataimporter.request.a.b" is ambiguous as to whether it refers to one of 4 toplevel namespaces: "dataimporter", "dataimporter.request", "dataimporter.request.a" or "datimporter.request.a.b". Actually, the problem lies in the current implementation since some of the parameters could potentially be stored in different choices of namespace, thus rendering the others effectively inaccessible. I'm just saying the hierarchy should be somehow enforced to remove that amibguity. By enforcing the hierarchy you ensure that all the request parameters are actually stored in the same namespace.

        In general I think you would want to prevent people from putting variables into reserved namespaces like "dataimport.request" by documenting that they are reserved for request parameters, that is fair warning that it is not necessarily a safe place to keep variables. Either way, the intention of removing the whole namespace should be the same.

        Whatever you decide I hope you will also keep the log warning when a variable fails to be resolved. That will at least give a clue if template variable resolution is not working as you expect.

        Show
        Jared Flatow added a comment - If I add it after splitting the names , how will I remove the namespace "dataimporter.request" completely without removing other variables which were added with addTonameSpace separately? I'm not clear why this is different than the way it is now? If "dataimporter.request.a.b" actually refers to a mapping with key "a.b" in the "dataimporter.request" namespace, or a mapping with key "b" in the "dataimporter.request.a" namespace is irrelevant in the case where you want to completely remove the namespace "datimporter.request". The difference is that as it is implemented now, "dataimporter.request.a.b" is ambiguous as to whether it refers to one of 4 toplevel namespaces: "dataimporter", "dataimporter.request", "dataimporter.request.a" or "datimporter.request.a.b". Actually, the problem lies in the current implementation since some of the parameters could potentially be stored in different choices of namespace, thus rendering the others effectively inaccessible. I'm just saying the hierarchy should be somehow enforced to remove that amibguity. By enforcing the hierarchy you ensure that all the request parameters are actually stored in the same namespace. In general I think you would want to prevent people from putting variables into reserved namespaces like "dataimport.request" by documenting that they are reserved for request parameters, that is fair warning that it is not necessarily a safe place to keep variables. Either way, the intention of removing the whole namespace should be the same. Whatever you decide I hope you will also keep the log warning when a variable fails to be resolved. That will at least give a clue if template variable resolution is not working as you expect.
        Hide
        Noble Paul added a comment -

        In that case, what about splitting the names that are added as you previously suggested, so that "c.d" adds "a.b.c", then "a.b.c.d"?

        Take the case of "dataimporter.request" (request parameters) namespace. This namespace is very likely to contain parameters with '.' (dot) in that . If I add it after splitting the names , how will I remove the namespace "dataimporter.request" completely without removing other variables which were added with addTonameSpace separately?

        I guess we can add the new method addToNamespace with the current semantics. The ambiguity may not be really as bad as long as it is documented

        Show
        Noble Paul added a comment - In that case, what about splitting the names that are added as you previously suggested, so that "c.d" adds "a.b.c", then "a.b.c.d"? Take the case of "dataimporter.request" (request parameters) namespace. This namespace is very likely to contain parameters with '.' (dot) in that . If I add it after splitting the names , how will I remove the namespace "dataimporter.request" completely without removing other variables which were added with addTonameSpace separately? I guess we can add the new method addToNamespace with the current semantics. The ambiguity may not be really as bad as long as it is documented
        Hide
        Jared Flatow added a comment -

        In that case, what about splitting the names that are added as you previously suggested, so that "c.d" adds "a.b.c", then "a.b.c.d"? At least, for the addToNamespace(name, key, value) method. If you think it is useful to have that behavior, it makes sense, and at least it will be well-defined. Right now, the fact that the lookup of "a.b.c.d" relies on "a.b.c" not existing seems really bad.

        Show
        Jared Flatow added a comment - In that case, what about splitting the names that are added as you previously suggested, so that "c.d" adds "a.b.c", then "a.b.c.d"? At least, for the addToNamespace(name, key, value) method. If you think it is useful to have that behavior, it makes sense, and at least it will be well-defined. Right now, the fact that the lookup of "a.b.c.d" relies on "a.b.c" not existing seems really bad.
        Hide
        Noble Paul added a comment -

        Yeah, so: <word>.<another_word>.<one_more_word> will always do a hierarchical lookup

        In the current implementation , it is not 'always' a hierarchical lookup. I guess it is useful that way.

        I think if you add 'c.d' as a key of a mapping into the namespace 'a.b', the only way to resolve it should be to get the 'a.b' namespace and then get the key 'c.d' (i.e. the same behavior for putting it in as for taking it out ....

        invoking variableresolver thru an API is not the most common usecase. When a variable is resolved , I wish the behavior to be intuitive. if someone puts a leaf value with "c.d " in the namespace "a.b" I wish it to fetch the actual value of "c.d" (which it does currently) . If he also has a namespace called "c" inside "a.b" then there is ambiguity , but that is less common and the current behavior seems to be OK.

        What you are asking now is to change the current behavior of VariableResolver in a non-backcompat way and I cannot see the value. I agree with adding the method addToNameSpace() . But , I am not comfortable changing the way the lookup is done.

        Show
        Noble Paul added a comment - Yeah, so: <word>.<another_word>.<one_more_word> will always do a hierarchical lookup In the current implementation , it is not 'always' a hierarchical lookup. I guess it is useful that way. I think if you add 'c.d' as a key of a mapping into the namespace 'a.b', the only way to resolve it should be to get the 'a.b' namespace and then get the key 'c.d' (i.e. the same behavior for putting it in as for taking it out .... invoking variableresolver thru an API is not the most common usecase. When a variable is resolved , I wish the behavior to be intuitive. if someone puts a leaf value with "c.d " in the namespace "a.b" I wish it to fetch the actual value of "c.d" (which it does currently) . If he also has a namespace called "c" inside "a.b" then there is ambiguity , but that is less common and the current behavior seems to be OK. What you are asking now is to change the current behavior of VariableResolver in a non-backcompat way and I cannot see the value. I agree with adding the method addToNameSpace() . But , I am not comfortable changing the way the lookup is done.
        Hide
        Jared Flatow added a comment -

        can I add a name with '.' (dot) in the end? . I guess it should be possible

        Yeah, so: <word>.<another_word>.<one_more_word> will always do a hierarchical lookup. But if you have something like <word>.<another_word>.function(<another_namespace>, <another_namespace>, etc.), the last part unambiguously refers to a function because of the non-word characters (the parenthesis, commas, spaces). I think that is a reasonable definition, but you can always expand the class of word characters a little to include things like '/', ':', and '#'.

        so , what should be the behavior? if I add a namespace "c.d" to "a.b" it should create a new namespace "c" inside "b" and put "d" into it? if we do not do that it is still possible to have ambiguity.

        I think if you add 'c.d' as a key of a mapping into the namespace 'a.b', the only way to resolve it should be to get the 'a.b' namespace and then get the key 'c.d' (i.e. the same behavior for putting it in as for taking it out). Presumably if you are putting '.' into your key names you are asking for trouble, but the behavior is actually unambiguous. If you want to have a value 'd' in namespace 'a.b.c', then you should add to the 'a.b.c' namespace in the first place. If you add 'c.d' to 'a.b', then thats what you will end up with (i.e., there is no namespace 'a.b.c.d', only 'a.b' with a key 'c.d').

        Show
        Jared Flatow added a comment - can I add a name with '.' (dot) in the end? . I guess it should be possible Yeah, so: <word>.<another_word>.<one_more_word> will always do a hierarchical lookup. But if you have something like <word>.<another_word>.function(<another_namespace>, <another_namespace>, etc.), the last part unambiguously refers to a function because of the non-word characters (the parenthesis, commas, spaces). I think that is a reasonable definition, but you can always expand the class of word characters a little to include things like '/', ':', and '#'. so , what should be the behavior? if I add a namespace "c.d" to "a.b" it should create a new namespace "c" inside "b" and put "d" into it? if we do not do that it is still possible to have ambiguity. I think if you add 'c.d' as a key of a mapping into the namespace 'a.b', the only way to resolve it should be to get the 'a.b' namespace and then get the key 'c.d' (i.e. the same behavior for putting it in as for taking it out). Presumably if you are putting '.' into your key names you are asking for trouble, but the behavior is actually unambiguous. If you want to have a value 'd' in namespace 'a.b.c', then you should add to the 'a.b.c' namespace in the first place. If you add 'c.d' to 'a.b', then thats what you will end up with (i.e., there is no namespace 'a.b.c.d', only 'a.b' with a key 'c.d').
        Hide
        Noble Paul added a comment -

        Why shouldn't it work for the EvaluatorBag Functions? I am using the patch, as well as those functions...

        Actually , I did not study the patch correctly. sorry about that.

        I am not sure about one thing

          /*
           * Splits the name of a namespace into parts which contain only word characters.
           * The last part of a name may contain arbitrary characters, since it might refer to something other than a namespace.
           */
          private ArrayList<String> splitName(String name) {
        

        can I add a name with '.' (dot) in the end? . I guess it should be possible

        What if I have a namespace "a.b" containing a mapping with key "c.d", what does "a.b.c.d" refer to? The answer currently is it depends what other namespaces I have.

        so , what should be the behavior? if I add a namespace "c.d" to "a.b" it should create a new namespace "c" inside "b" and put "d" into it? if we do not do that it is still possible to have ambiguity.

        Show
        Noble Paul added a comment - Why shouldn't it work for the EvaluatorBag Functions? I am using the patch, as well as those functions... Actually , I did not study the patch correctly. sorry about that. I am not sure about one thing /* * Splits the name of a namespace into parts which contain only word characters. * The last part of a name may contain arbitrary characters, since it might refer to something other than a namespace. */ private ArrayList< String > splitName( String name) { can I add a name with '.' (dot) in the end? . I guess it should be possible What if I have a namespace "a.b" containing a mapping with key "c.d", what does "a.b.c.d" refer to? The answer currently is it depends what other namespaces I have. so , what should be the behavior? if I add a namespace "c.d" to "a.b" it should create a new namespace "c" inside "b" and put "d" into it? if we do not do that it is still possible to have ambiguity.
        Hide
        Jared Flatow added a comment -

        This patch is WRONG. It would not work for the functions namespace. Because functions implements a custom map. look at EvaluatorBag.getFunctionsNameSpace().

        Why shouldn't it work for the EvaluatorBag Functions? I am using the patch, as well as those functions...

        Moreover ,your version of the VariableResolverImpl is more expensive and creates more objects. Map.putAll() is far more expensive because it creates so many Map.Entry and removals are more costly.

        Given the size of the namespaces I doubt there is a major performance penalty (it only creates "so many Map.Entry"s if they exist in the namespace already), but if that is a concern it can be avoided by getting the parent namespace first.

        Can you tell me what are you trying to achieve. If you can give me usecases It may help me solve it better.

        I want to define the behavior of the VariableResolver better so it is more usable. There are significant ambiguities with the existing implementation. The "DOT_SPLIT and backup" of namespace names is incorrect. What if I have a namespace "a.b" containing a mapping with key "c.d", what does "a.b.c.d" refer to? The answer currently is it depends what other namespaces I have.

        Ideally, I would like to see the VariableResolver interface define the correct way to put values in/out of a namespace, and tell you which values are going to be overwritten when. This will make writing transformers more straightforward.

        Show
        Jared Flatow added a comment - This patch is WRONG. It would not work for the functions namespace. Because functions implements a custom map. look at EvaluatorBag.getFunctionsNameSpace(). Why shouldn't it work for the EvaluatorBag Functions? I am using the patch, as well as those functions... Moreover ,your version of the VariableResolverImpl is more expensive and creates more objects. Map.putAll() is far more expensive because it creates so many Map.Entry and removals are more costly. Given the size of the namespaces I doubt there is a major performance penalty (it only creates "so many Map.Entry"s if they exist in the namespace already), but if that is a concern it can be avoided by getting the parent namespace first. Can you tell me what are you trying to achieve. If you can give me usecases It may help me solve it better. I want to define the behavior of the VariableResolver better so it is more usable. There are significant ambiguities with the existing implementation. The "DOT_SPLIT and backup" of namespace names is incorrect. What if I have a namespace "a.b" containing a mapping with key "c.d", what does "a.b.c.d" refer to? The answer currently is it depends what other namespaces I have. Ideally, I would like to see the VariableResolver interface define the correct way to put values in/out of a namespace, and tell you which values are going to be overwritten when. This will make writing transformers more straightforward.
        Hide
        Noble Paul added a comment -

        If that is the case then please take a look at patch 993b.

        This patch is WRONG. It would not work for the functions namespace. Because functions implements a custom map. look at EvaluatorBag.getFunctionsNameSpace(). Moreover ,your version of the VariableResolverImpl is more expensive and creates more objects. Map.putAll() is far more expensive because it creates so many Map.Entry and removals are more costly.

        Can you tell me what are you trying to achieve. If you can give me usecases It may help me solve it better.

        vr.removeNamespace(null) is a legacy code can be removed.

        Show
        Noble Paul added a comment - If that is the case then please take a look at patch 993b. This patch is WRONG. It would not work for the functions namespace. Because functions implements a custom map. look at EvaluatorBag.getFunctionsNameSpace(). Moreover ,your version of the VariableResolverImpl is more expensive and creates more objects. Map.putAll() is far more expensive because it creates so many Map.Entry and removals are more costly. Can you tell me what are you trying to achieve. If you can give me usecases It may help me solve it better. vr.removeNamespace(null) is a legacy code can be removed.
        Hide
        Shalin Shekhar Mangar added a comment -

        Sure, I'll take a look at the patch.

        Also, shouldn't more functionality be added to the interface so other classes can stop relying on the Impl?

        Can you give an example?

        Show
        Shalin Shekhar Mangar added a comment - Sure, I'll take a look at the patch. Also, shouldn't more functionality be added to the interface so other classes can stop relying on the Impl? Can you give an example?
        Hide
        Jared Flatow added a comment -

        If that is the case then please take a look at patch 993b. It should keep the desired behavior and add some functionality to simplify using the VariableResolverImpl. I cleaned up some of the calls to the resolver to demonstrate this. Also, shouldn't more functionality be added to the interface so other classes can stop relying on the Impl?

        Show
        Jared Flatow added a comment - If that is the case then please take a look at patch 993b. It should keep the desired behavior and add some functionality to simplify using the VariableResolverImpl. I cleaned up some of the calls to the resolver to demonstrate this. Also, shouldn't more functionality be added to the interface so other classes can stop relying on the Impl?
        Hide
        Shalin Shekhar Mangar added a comment -

        I am confused by the implementation of namespaces in the VariableResolverImpl. Why are they implemented as chained hashmaps and not flat?

        There are many different types of namespaces. For example "dataimporter" which is used for last_index_time, request parameters, functions etc. Most of the variables needed to be resolved are dynamic and cannot be pre-computed. They are added as a Map of variable names to values just-in-time to a namespace. For example each of the field's values are available in the entity name's namespace. Two entities may have the same field name (e.g. ID is a common name for a primary key and can be present in more than one entities). Therefore a chained map gives us the flexibility of avoiding conflicts and too many top-level names.

        Also consider the 'functions' namespace used for Evaluators. The evaluator map is actually a dummy object allows us to just-in-time evaluation by overriding the get method.

        The remove method seems to treat them as flat, which seems totally inconsistent?

        addNamespace adds a complete namespace and the removeNamespace method removes a complete namespace.

        vr.removeNamespace(null);

        I think this was being used as a way to clear the resolver to prepare for the next document. But this is a no-op because the removeNamespace method doesn't do anything if the parameter is null. Need to spend some time to see if it can be removed or replaced with something else.

        Show
        Shalin Shekhar Mangar added a comment - I am confused by the implementation of namespaces in the VariableResolverImpl. Why are they implemented as chained hashmaps and not flat? There are many different types of namespaces. For example "dataimporter" which is used for last_index_time, request parameters, functions etc. Most of the variables needed to be resolved are dynamic and cannot be pre-computed. They are added as a Map of variable names to values just-in-time to a namespace. For example each of the field's values are available in the entity name's namespace. Two entities may have the same field name (e.g. ID is a common name for a primary key and can be present in more than one entities). Therefore a chained map gives us the flexibility of avoiding conflicts and too many top-level names. Also consider the 'functions' namespace used for Evaluators. The evaluator map is actually a dummy object allows us to just-in-time evaluation by overriding the get method. The remove method seems to treat them as flat, which seems totally inconsistent? addNamespace adds a complete namespace and the removeNamespace method removes a complete namespace. vr.removeNamespace(null); I think this was being used as a way to clear the resolver to prepare for the next document. But this is a no-op because the removeNamespace method doesn't do anything if the parameter is null. Need to spend some time to see if it can be removed or replaced with something else.
        Hide
        Jared Flatow added a comment -

        I am confused by the implementation of namespaces in the VariableResolverImpl. Why are they implemented as chained hashmaps and not flat? The remove method seems to treat them as flat, which seems totally inconsistent? Why does DocBuilder have on line 355:

        if (isRoot)
         vr.removeNamespace(null);
        

        What is this supposed to accomplish?

        I have attempted to patch the implementation with what I believe is the desired behavior. I added an addToNamespace function as well.

        Show
        Jared Flatow added a comment - I am confused by the implementation of namespaces in the VariableResolverImpl. Why are they implemented as chained hashmaps and not flat? The remove method seems to treat them as flat, which seems totally inconsistent? Why does DocBuilder have on line 355: if (isRoot) vr.removeNamespace( null ); What is this supposed to accomplish? I have attempted to patch the implementation with what I believe is the desired behavior. I added an addToNamespace function as well.
        Hide
        Noble Paul added a comment -

        addNameSpace() is adding a namespace and replace if that exists. it is equivalent to map.put().

        The functionality that you want is something like addToNamespace()

        Show
        Noble Paul added a comment - addNameSpace() is adding a namespace and replace if that exists. it is equivalent to map.put(). The functionality that you want is something like addToNamespace()
        Hide
        Jared Flatow added a comment -

        It is actually the fact that other places in the code, like XPathEntityProcessor, which use addNamespace, seem like they shouldn't be overwriting whatever else is there. What is the intended behavior of addNamespace?

        Show
        Jared Flatow added a comment - It is actually the fact that other places in the code, like XPathEntityProcessor, which use addNamespace, seem like they shouldn't be overwriting whatever else is there. What is the intended behavior of addNamespace?
        Hide
        Noble Paul added a comment -

        adding to an existing namespace can be done by this
        assume that you have a namespace "a.b" and you wish to add a variable "c" to the same namespace

        try this

        ((Map)variableresolverimpl.resolve("a.b")).put("c","Hello C")
        

        is that what you wish to do?

        Show
        Noble Paul added a comment - adding to an existing namespace can be done by this assume that you have a namespace "a.b" and you wish to add a variable "c" to the same namespace try this ((Map)variableresolverimpl.resolve( "a.b" )).put( "c" , "Hello C" ) is that what you wish to do?
        Hide
        Jared Flatow added a comment -

        Should solve the issue with some additional cleanups to related files.

        Show
        Jared Flatow added a comment - Should solve the issue with some additional cleanups to related files.

          People

          • Assignee:
            Noble Paul
            Reporter:
            Jared Flatow
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Time Tracking

              Estimated:
              Original Estimate - 5m
              5m
              Remaining:
              Remaining Estimate - 5m
              5m
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development