Thrift
  1. Thrift
  2. THRIFT-339

THRIFT-242 is incompatible with arguments with empty key fields

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.1
    • Fix Version/s: None
    • Component/s: Python - Compiler
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      Esteve's last change to how default values are stored broke stuff. Here is a quick example:

      {{
      service Test

      { bool get_slice(i32 start = -1), }

      }}

      generates

      {{
      class get_slice_args:

      thrift_spec = None
      def _init_(self, start=thrift_spec[-1][4],):
      self.start = start
      }}

      which is obviously invalid.

      I'm not sure how thrift_spec is supposed to be populated here so I'm unsure how to fix this.

      1. default+neg-field.diff
        13 kB
        Alexander Shigin
      2. default+neg-field-2.diff
        9 kB
        Alexander Shigin
      3. thrift-339.patch
        3 kB
        Esteve Fernandez
      4. thrift-339-2.patch
        3 kB
        Esteve Fernandez
      5. thrift-339-3.patch
        7 kB
        Esteve Fernandez
      6. thrift-339-4.patch
        7 kB
        Esteve Fernandez
      7. thrift-339-5.patch
        7 kB
        Esteve Fernandez
      8. thrift-339-6.patch
        9 kB
        Esteve Fernandez
      9. thrift-python-defaults.patch
        3 kB
        Alexander Shigin
      10. thrift-python-defaults-v2.patch
        3 kB
        Alexander Shigin

        Issue Links

          Activity

          Hide
          Jonathan Ellis added a comment -

          damn it, jira said {{ }} would monospace my code samples there.

          at least they are short enough that it's still pretty clear what is going on.

          Show
          Jonathan Ellis added a comment - damn it, jira said {{ }} would monospace my code samples there. at least they are short enough that it's still pretty clear what is going on.
          Hide
          Esteve Fernandez added a comment - - edited

          Although I'm happy to admit any breakage I cause I don't think this issue is valid (or at least, not for the reasons exposed). Note that you're not using a field key, and thus the thrift_spec variable is not populated at all. You can fix this using a positive field key:

          service Test

          { bool get_slice(1:i32 start = -1), }

          which should generate this code:

          class get_slice_args:

          thrift_spec = (
          None, # 0
          (1, TType.I32, 'start', None, -1, ), # 1
          )

          I think the compiler should abort if it doesn't find a valid field key and don't generate any code, though.

          Show
          Esteve Fernandez added a comment - - edited Although I'm happy to admit any breakage I cause I don't think this issue is valid (or at least, not for the reasons exposed). Note that you're not using a field key, and thus the thrift_spec variable is not populated at all. You can fix this using a positive field key: service Test { bool get_slice(1:i32 start = -1), } which should generate this code: class get_slice_args: thrift_spec = ( None, # 0 (1, TType.I32, 'start', None, -1, ), # 1 ) I think the compiler should abort if it doesn't find a valid field key and don't generate any code, though.
          Hide
          Jonathan Ellis added a comment -

          i thought the field keys are supposed to be optional, which is good, because they fuglify things up quite well.

          Show
          Jonathan Ellis added a comment - i thought the field keys are supposed to be optional, which is good, because they fuglify things up quite well.
          Hide
          Esteve Fernandez added a comment -

          I spoke too early, it seems my patch DID actually break things Here's the code that an old compiler generates:

          thrift_spec = None
          def _init_(self, d=None):
          self.start = -1
          if isinstance(d, dict):
          if 'start' in d:
          self.start = d['start']

          so, unless we always generate thrift_spec or we use a sentinel, I don't know if this can be fixed. However, I wonder how the fastbinary extension could work if thrift_spec is None. I can change the compiler to generate the thrift_spec variable no matter what, but would like to hear the opinion of others.

          Show
          Esteve Fernandez added a comment - I spoke too early, it seems my patch DID actually break things Here's the code that an old compiler generates: thrift_spec = None def _ init _(self, d=None): self.start = -1 if isinstance(d, dict): if 'start' in d: self.start = d ['start'] so, unless we always generate thrift_spec or we use a sentinel, I don't know if this can be fixed. However, I wonder how the fastbinary extension could work if thrift_spec is None. I can change the compiler to generate the thrift_spec variable no matter what, but would like to hear the opinion of others.
          Hide
          Esteve Fernandez added a comment - - edited

          Argh, I just realized that even if we always generated the thrift_spec variable, it wouldn't work as it would try to access thrift_spec[-1]

          Show
          Esteve Fernandez added a comment - - edited Argh, I just realized that even if we always generated the thrift_spec variable, it wouldn't work as it would try to access thrift_spec [-1]
          Hide
          Esteve Fernandez added a comment -

          It's quite late here, so I'm probably asking something stupid, but why is thrift_spec a tuple? Using a dict keyed on field keys, would allow the fastbinary extension to support negative field keys, IMHO. But I'm sure there must be a reason why it isn't, apart from dictionaries being mutable. I wish we could use named tuples in Python 2.4 and 2.5

          THRIFT-105 is slightly related to this, so Alexander can explain the problem with negative field keys much better than I.

          Show
          Esteve Fernandez added a comment - It's quite late here, so I'm probably asking something stupid, but why is thrift_spec a tuple? Using a dict keyed on field keys, would allow the fastbinary extension to support negative field keys, IMHO. But I'm sure there must be a reason why it isn't, apart from dictionaries being mutable. I wish we could use named tuples in Python 2.4 and 2.5 THRIFT-105 is slightly related to this, so Alexander can explain the problem with negative field keys much better than I.
          Hide
          David Reiss added a comment -

          I think it was a tuple for performance reasons. I just thought of a way that we might be able to simplify both this issue and THRIFT-105. tuples can take negative indexes (indices?) just like lists, and they are counted from the end. What if we just extended the thrift_spec tuple so that all of the negative fields were in the right place (after all of the positive fields)? So if you had a struct like

          struct foo {
            i32 bar;
            i32 baz;
            2: i32 qux;
          }
          

          then the thrift_spec would look like

          (
          BLANK, # 0 aka -5
          BLANK, # 1 aka -4
          spec_for_qux, # 2 aka -3
          spec_for_baz, # 3 aka -2
          spec_for_bar, # 4 aka -1
          )
          

          This would not disturb the existing thrift_spec fields at all, so the current fastbinary extension would work fine. It should be robust against adding new negative fields to the end, since the indexes (indices?) of the existing fields will not change. I think it would make the patch for THRIFT-105 a lot simpler too, though we might need a little bit of magic to make sure that negative indexes work in the C code.

          Thoughts?

          Show
          David Reiss added a comment - I think it was a tuple for performance reasons. I just thought of a way that we might be able to simplify both this issue and THRIFT-105 . tuples can take negative indexes (indices?) just like lists, and they are counted from the end. What if we just extended the thrift_spec tuple so that all of the negative fields were in the right place (after all of the positive fields)? So if you had a struct like struct foo { i32 bar; i32 baz; 2: i32 qux; } then the thrift_spec would look like ( BLANK, # 0 aka -5 BLANK, # 1 aka -4 spec_for_qux, # 2 aka -3 spec_for_baz, # 3 aka -2 spec_for_bar, # 4 aka -1 ) This would not disturb the existing thrift_spec fields at all, so the current fastbinary extension would work fine. It should be robust against adding new negative fields to the end, since the indexes (indices?) of the existing fields will not change. I think it would make the patch for THRIFT-105 a lot simpler too, though we might need a little bit of magic to make sure that negative indexes work in the C code. Thoughts?
          Hide
          Esteve Fernandez added a comment -

          This patch should fix this issue, I tested it using both the Python-only binary protocol and the native extension. This should fix THRIFT-105 as well.

          Show
          Esteve Fernandez added a comment - This patch should fix this issue, I tested it using both the Python-only binary protocol and the native extension. This should fix THRIFT-105 as well.
          Hide
          Esteve Fernandez added a comment -

          Argh, I forgot to add the changes I made to fastbinary.c in the previous patch, here they are.

          Show
          Esteve Fernandez added a comment - Argh, I forgot to add the changes I made to fastbinary.c in the previous patch, here they are.
          Hide
          Esteve Fernandez added a comment -

          I implemented it the way you suggested David, the changes to fastbinary.c were minimal and the compiler got a bit simpler. I've tested it, but it needs a review

          Show
          Esteve Fernandez added a comment - I implemented it the way you suggested David, the changes to fastbinary.c were minimal and the compiler got a bit simpler. I've tested it, but it needs a review
          Hide
          David Reiss added a comment -

          This looks awesome. There is one thing that I'm a little uneasy about, though, which is that (in my example), a field with id 3 (which should be skipped) would be treated as -2 instead. I think the only solution for this is to provide some extra info to the extension, specifically the maximum expected positive field id, and use that to skip over anything larger.

          Show
          David Reiss added a comment - This looks awesome. There is one thing that I'm a little uneasy about, though, which is that (in my example), a field with id 3 (which should be skipped) would be treated as -2 instead. I think the only solution for this is to provide some extra info to the extension, specifically the maximum expected positive field id, and use that to skip over anything larger.
          Hide
          Esteve Fernandez added a comment -

          Thanks! I think that information could be stored as the first member of the thrift_spec tuple (key 0), as it's already empty. It shouldn't take too much work (as it's already in the sorted_keys_pos variable). What do you think?

          Show
          Esteve Fernandez added a comment - Thanks! I think that information could be stored as the first member of the thrift_spec tuple (key 0), as it's already empty. It shouldn't take too much work (as it's already in the sorted_keys_pos variable). What do you think?
          Hide
          David Reiss added a comment -

          function result structures use field 0, so how about a new attribute?

          Show
          David Reiss added a comment - function result structures use field 0, so how about a new attribute?
          Hide
          Esteve Fernandez added a comment -

          Ouch, you're right. A new class attribute (e.g. thrift_tag_limits) for storing maximum and minimum field keys sounds good, but I worry we might end up having too many attributes in the future and I'd like to keep all this information in a single attribute. Anyway, I think I can implement it in a relatively short time.

          Show
          Esteve Fernandez added a comment - Ouch, you're right. A new class attribute (e.g. thrift_tag_limits) for storing maximum and minimum field keys sounds good, but I worry we might end up having too many attributes in the future and I'd like to keep all this information in a single attribute. Anyway, I think I can implement it in a relatively short time.
          Hide
          Esteve Fernandez added a comment -

          This patch adds some checks on lower and upper field keys using a new attribute called thrift_limits, I tested it both with the native extension and the Python-based binary protocol

          Show
          Esteve Fernandez added a comment - This patch adds some checks on lower and upper field keys using a new attribute called thrift_limits, I tested it both with the native extension and the Python-based binary protocol
          Hide
          Esteve Fernandez added a comment -

          Doesn't fix anything, but makes the compiler source code a bit simpler and cleaner.

          Show
          Esteve Fernandez added a comment - Doesn't fix anything, but makes the compiler source code a bit simpler and cleaner.
          Hide
          David Reiss added a comment -

          Something isn't right here. I think it has to do with resetting the sorted_key_pos inside the inner loop.

          For this structure...

          struct AllPos {
            2: i32 foo;
            3: i32 bar;
          }
          

          I get this output...

            thrift_spec = (
              None, # 0
              None, # 1
              (2, TType.I32, 'foo', None, None, ), # 2
              None, # 0
              None, # 1
              None, # 2
              (3, TType.I32, 'bar', None, None, ), # 3
            )
          
          Show
          David Reiss added a comment - Something isn't right here. I think it has to do with resetting the sorted_key_pos inside the inner loop. For this structure... struct AllPos { 2: i32 foo; 3: i32 bar; } I get this output... thrift_spec = ( None, # 0 None, # 1 (2, TType.I32, 'foo', None, None, ), # 2 None, # 0 None, # 1 None, # 2 (3, TType.I32, 'bar', None, None, ), # 3 )
          Hide
          Esteve Fernandez added a comment -

          Fixes repeated fields in thrift_spec

          Show
          Esteve Fernandez added a comment - Fixes repeated fields in thrift_spec
          Hide
          Esteve Fernandez added a comment -

          Fixes an issue that with negative field keys, all tests pass.

          Show
          Esteve Fernandez added a comment - Fixes an issue that with negative field keys, all tests pass.
          Hide
          Esteve Fernandez added a comment - - edited

          I had to rewrite my latest patch to fix structures with negative field keys, if they end with something lower than -1 For example, the ThriftTest.testException method generated a thrift_spec like this:

            thrift_spec = (
              (-2, TType.STRUCT, 'err1', (Xception, Xception.thrift_spec, Xception.thrift_limits), None, ), # -2
            )
          

          which caused some errors (i.e. thrift_spec[-2] didn't exist)

          It's not as clean or pretty as the previous patch but it works.

          Show
          Esteve Fernandez added a comment - - edited I had to rewrite my latest patch to fix structures with negative field keys, if they end with something lower than -1 For example, the ThriftTest.testException method generated a thrift_spec like this: thrift_spec = ( (-2, TType.STRUCT, 'err1', (Xception, Xception.thrift_spec, Xception.thrift_limits), None, ), # -2 ) which caused some errors (i.e. thrift_spec [-2] didn't exist) It's not as clean or pretty as the previous patch but it works.
          Hide
          Alexander Shigin added a comment -

          The patch fix default values. The patch is depend on THRIFT-105 patch. The patch duplicate default values in constructor and thrift_spec. I found it far better compare to thrift_spec[xxxx][4].

          The patch is small and (I believe) clean enough.

          Show
          Alexander Shigin added a comment - The patch fix default values. The patch is depend on THRIFT-105 patch. The patch duplicate default values in constructor and thrift_spec. I found it far better compare to thrift_spec [xxxx] [4] . The patch is small and (I believe) clean enough.
          Hide
          David Reiss added a comment -

          Esteve: We should be able to go back to patch #5 if my solution to THRIFT-361 is accepted. Thanks for uncovering this bug!

          Alexander: At the moment, I prefer the solution in Esteve's patch (I am also biased, because I tried to push him toward that implementation). The main reason is that it is very low impact. The old generated code will continue to work with the new extension and vice versa. This version allows you to simply index into the thrift_spec list with the field id and have it work (without having to add an offset). The limits are completely unnecessary as long as the data is assumed valid. Finally, it allows users to set a field to None in the constructor, overriding the default.

          Show
          David Reiss added a comment - Esteve: We should be able to go back to patch #5 if my solution to THRIFT-361 is accepted. Thanks for uncovering this bug! Alexander: At the moment, I prefer the solution in Esteve's patch (I am also biased, because I tried to push him toward that implementation). The main reason is that it is very low impact. The old generated code will continue to work with the new extension and vice versa. This version allows you to simply index into the thrift_spec list with the field id and have it work (without having to add an offset). The limits are completely unnecessary as long as the data is assumed valid. Finally, it allows users to set a field to None in the constructor, overriding the default.
          Hide
          Esteve Fernandez added a comment -

          Alexander: I don't fully understand your patch. Actually, it breaks the compiler and default values are not properly handled. For example, given this structure:

          struct Foo {
              1:optional list<i32> l = [1,2,3,4],
          }
          

          the code generated after applying your patch is:

            def __init__(self, l=None):
              if l is None:
                l = [
                  1,
                  2,
                  3,
                  4,
                ]
          

          which is wrong, which makes it impossible to pass None as l to Foo. Also, it doesn't set self.l to l, but I guess it was just a slip. With the current behavior, the generated code is:

            def __init__(self, l=thrift_spec[1][4],):
              if l is self.thrift_spec[1][4]:
                l = [
                1,
                2,
                3,
                4,
              ]
              self.l = l
          
          Show
          Esteve Fernandez added a comment - Alexander: I don't fully understand your patch. Actually, it breaks the compiler and default values are not properly handled. For example, given this structure: struct Foo { 1:optional list<i32> l = [1,2,3,4], } the code generated after applying your patch is: def __init__(self, l=None): if l is None: l = [ 1, 2, 3, 4, ] which is wrong, which makes it impossible to pass None as l to Foo. Also, it doesn't set self.l to l, but I guess it was just a slip. With the current behavior, the generated code is: def __init__(self, l=thrift_spec[1][4],): if l is self.thrift_spec[1][4]: l = [ 1, 2, 3, 4, ] self.l = l
          Hide
          Esteve Fernandez added a comment -

          David: great! However, I think the thrift_limits variable is necessary because, as you pointed out, if we have a structure like the one you added, a field with id 3 would be treated as -2. So, if we had a newer version of that structure, which adds another field (with id 3), wouldn't it clash with the old version?

          Show
          Esteve Fernandez added a comment - David: great! However, I think the thrift_limits variable is necessary because, as you pointed out, if we have a structure like the one you added, a field with id 3 would be treated as -2. So, if we had a newer version of that structure, which adds another field (with id 3), wouldn't it clash with the old version?
          Hide
          Alexander Shigin added a comment -

          1. Oops... I have to sleep more. I've fixed the bug.
          2. I can't imagine a scenario which makes you to pass None (instead of empty list/set/dict) to init. Can you give me one?

          Show
          Alexander Shigin added a comment - 1. Oops... I have to sleep more. I've fixed the bug. 2. I can't imagine a scenario which makes you to pass None (instead of empty list/set/dict) to init. Can you give me one?
          Hide
          Esteve Fernandez added a comment -

          Using the aforementioned structure (Foo), if I want l (an optional member) to be None so that it's not serialized, it would still end up passing [1,2,3,4] to the underlying transport.

          Actually, I don't know what's wrong with the current behavior. The problem is in how thrift_spec for negative field keys is being generated.

          Show
          Esteve Fernandez added a comment - Using the aforementioned structure (Foo), if I want l (an optional member) to be None so that it's not serialized, it would still end up passing [1,2,3,4] to the underlying transport. Actually, I don't know what's wrong with the current behavior. The problem is in how thrift_spec for negative field keys is being generated.
          Hide
          Jonathan Ellis added a comment -

          THRIFT-361 was applied. Does that mean we can apply patch #5 for this now, per David's comments above?

          Show
          Jonathan Ellis added a comment - THRIFT-361 was applied. Does that mean we can apply patch #5 for this now, per David's comments above?
          Hide
          David Reiss added a comment -

          Esteve: Good call on 3 vs. -2.

          Don't we also need to adjust type_to_spec_args in order to include the limits for nested structures?

          Also, it is unfortunate that adding the limits broke compatibility with the old version of the extension module. It doesn't seem like it will be easy to maintain compatibility, so as long as we are breaking it, we might as well go wild. If you want to combine the field specifiers and limits into a single attribute, that would be okay.

          Show
          David Reiss added a comment - Esteve: Good call on 3 vs. -2. Don't we also need to adjust type_to_spec_args in order to include the limits for nested structures? Also, it is unfortunate that adding the limits broke compatibility with the old version of the extension module. It doesn't seem like it will be easy to maintain compatibility, so as long as we are breaking it, we might as well go wild. If you want to combine the field specifiers and limits into a single attribute, that would be okay.
          Hide
          Bryan Duxbury added a comment -

          Is this something we could/should commit in 0.1? If so we need to make a final decision today.

          Show
          Bryan Duxbury added a comment - Is this something we could/should commit in 0.1? If so we need to make a final decision today.
          Hide
          Alexander Shigin added a comment -

          Another version of my patch + THRIFT-105.

          Show
          Alexander Shigin added a comment - Another version of my patch + THRIFT-105 .
          Hide
          Kevin Clark added a comment -

          Shooting for 0.1, if we get review.

          Show
          Kevin Clark added a comment - Shooting for 0.1, if we get review.
          Hide
          Alexander Shigin added a comment -

          I can be wrong, but Esteve's patch doesn't work with old generated code. I believe it's a problem (because we don't break backward compatibility in python library for 0.1).

          The big chunk of my patch was reviewed by Mark Slee in THRIFT-105.

          Show
          Alexander Shigin added a comment - I can be wrong, but Esteve's patch doesn't work with old generated code. I believe it's a problem (because we don't break backward compatibility in python library for 0.1). The big chunk of my patch was reviewed by Mark Slee in THRIFT-105 .
          Hide
          Jonathan Ellis added a comment -

          > I can be wrong, but Esteve's patch doesn't work with old generated code

          I'm confused. The whole point of this ticket is so we can replace old, broken generated code with generated code that works. If you are regenerating what is there to not work with?

          Personally I would be most comfortable to get Esteve's buy-in on whatever fix we go with since he was the author of the -242 patch IIRC.

          Show
          Jonathan Ellis added a comment - > I can be wrong, but Esteve's patch doesn't work with old generated code I'm confused. The whole point of this ticket is so we can replace old, broken generated code with generated code that works. If you are regenerating what is there to not work with? Personally I would be most comfortable to get Esteve's buy-in on whatever fix we go with since he was the author of the -242 patch IIRC.
          Hide
          Alexander Shigin added a comment -

          The Esteve's patch tries to fix two (THRIFT-105 and THRIFT-339) issue.

          The fix of fastbinary protocol requires three argument instead of two. The change breaks all old-generated code if you use fastbinary even if you hasn't got any fields with negative tags.

          The fix from THRIFT-105 (oh, it was six months ago) works fine with old-generated code, but it makes natural fields order: (-1, 0, 1, 2). Esteve uses order (0, 1, 2, -1).

          Show
          Alexander Shigin added a comment - The Esteve's patch tries to fix two ( THRIFT-105 and THRIFT-339 ) issue. The fix of fastbinary protocol requires three argument instead of two. The change breaks all old-generated code if you use fastbinary even if you hasn't got any fields with negative tags. The fix from THRIFT-105 (oh, it was six months ago) works fine with old-generated code, but it makes natural fields order: (-1, 0, 1, 2). Esteve uses order (0, 1, 2, -1).
          Hide
          David Reiss added a comment -

          Negative field ids are deprecated, so this doesn't need to block the release.

          Show
          David Reiss added a comment - Negative field ids are deprecated, so this doesn't need to block the release.
          Hide
          Jonathan Ellis added a comment -

          My case does not deal with negative field ids. (In the example I gave, the field default is -1, not the field id.)

          Show
          Jonathan Ellis added a comment - My case does not deal with negative field ids. (In the example I gave, the field default is -1, not the field id.)
          Hide
          Alexander Shigin added a comment - - edited
          bool get_slice(i32 start = -1),
          

          start has field id == -1

          bool get_slice(1: i32 start = -1),
          

          works fine

          Show
          Alexander Shigin added a comment - - edited bool get_slice(i32 start = -1), start has field id == -1 bool get_slice(1: i32 start = -1), works fine
          Hide
          Alexander Shigin added a comment -

          sync patch with head

          Show
          Alexander Shigin added a comment - sync patch with head
          Hide
          Bryan Duxbury added a comment -

          I'm deprioritizing this for now because I'm not sure what the status is, or if there's a champion for it.

          Show
          Bryan Duxbury added a comment - I'm deprioritizing this for now because I'm not sure what the status is, or if there's a champion for it.

            People

            • Assignee:
              Unassigned
              Reporter:
              Jonathan Ellis
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development