Avro
  1. Avro
  2. AVRO-161

Add test to Python bindings for opening a non-empty file object container and successfully adding new elements

    Details

    • Type: Test Test
    • Status: Closed
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: python
    • Labels:
      None

      Description

      Avro file object container are mutable and we should test that fact. testio.py currently just tests open a file for writing, writing a few records, closing the file, then reading the file back in.

      1. AVRO-161.patch
        3 kB
        Jeff Hammerbacher

        Issue Links

          Activity

          Jeff Hammerbacher created issue -
          Jeff Hammerbacher made changes -
          Field Original Value New Value
          Link This issue is related to AVRO-158 [ AVRO-158 ]
          Hide
          Jeff Hammerbacher added a comment -

          I've gotten this code path working in a slightly unusual fashion: I implemented Philip's suggestion in AVRO-158 to have a factory method called DataFileWriter.create_for_append() and I also added a factory method DataFileWriter.create_new() to replace the old, plain constructor. These new methods are implemented using Python's @classmethod decorator.

          It's been a while since I wrote production code, so this implementation may be hacky. I've gotten it working for realz in a script that you can see at http://github.com/hammer/avro-rpc-quickstart/blob/master/src/main/python/serialize_message.py. I will turn that patch into a new test in testio.py once I get confirmation on the approach I've taken.

          Looking forward to the feedback!

          Thanks,
          Jeff

          Show
          Jeff Hammerbacher added a comment - I've gotten this code path working in a slightly unusual fashion: I implemented Philip's suggestion in AVRO-158 to have a factory method called DataFileWriter.create_for_append() and I also added a factory method DataFileWriter.create_new() to replace the old, plain constructor. These new methods are implemented using Python's @classmethod decorator. It's been a while since I wrote production code, so this implementation may be hacky. I've gotten it working for realz in a script that you can see at http://github.com/hammer/avro-rpc-quickstart/blob/master/src/main/python/serialize_message.py . I will turn that patch into a new test in testio.py once I get confirmation on the approach I've taken. Looking forward to the feedback! Thanks, Jeff
          Jeff Hammerbacher made changes -
          Attachment AVRO-161.patch [ 12422999 ]
          Hide
          Philip Zeyliger added a comment -

          Seems reasonable from far away, though I'm not currently very familiar with the python code.

          You should add pydoc to _init_, create_new, and create_for_append, since it's not clear which one one is supposed to use. You should move the "from avro.genericio import DatumReader" up to the top of the file. In create_for_append and create_enw, the existence of the kwargs variable seems unnecessary; you might just do:

          return cls(writer, dwriter, schema=schema.parse(...),
            sync=...,
            codec=...,
            ...
          )
          

          Besides that, though, everything seems reasonable.

          Show
          Philip Zeyliger added a comment - Seems reasonable from far away, though I'm not currently very familiar with the python code. You should add pydoc to _ init _, create_new, and create_for_append, since it's not clear which one one is supposed to use. You should move the "from avro.genericio import DatumReader" up to the top of the file. In create_for_append and create_enw, the existence of the kwargs variable seems unnecessary; you might just do: return cls(writer, dwriter, schema=schema.parse(...), sync=..., codec=..., ... ) Besides that, though, everything seems reasonable.
          Jeff Hammerbacher made changes -
          Component/s python [ 12312781 ]
          Hide
          Jeff Hammerbacher added a comment -

          Hey,

          Could a Python committer please take a look at this patch? To address Philip's comments:

          You should add pydoc to init, create_new, and create_for_append, since it's not clear which one one is supposed to use.

          Sure, can do that. More generally, do we need a "create_new", or should _init_ default to new?

          You should move the "from avro.genericio import DatumReader" up to the top of the file.

          Can't due to some fun circular importing.

          In create_for_append and create_enw, the existence of the kwargs variable seems unnecessary

          Sure, I can do that. I thought kwargs was cleaner, but happy to have constructors with seven arguments if folks like them.

          Show
          Jeff Hammerbacher added a comment - Hey, Could a Python committer please take a look at this patch? To address Philip's comments: You should add pydoc to init , create_new, and create_for_append, since it's not clear which one one is supposed to use. Sure, can do that. More generally, do we need a "create_new", or should _ init _ default to new? You should move the "from avro.genericio import DatumReader" up to the top of the file. Can't due to some fun circular importing. In create_for_append and create_enw, the existence of the kwargs variable seems unnecessary Sure, I can do that. I thought kwargs was cleaner, but happy to have constructors with seven arguments if folks like them.
          Hide
          Sharad Agarwal added a comment -

          You should add pydoc to init, create_new, and create_for_append, since it's not clear which one one is supposed to use.

          Other alternative could be to use _init_(self, writer, dwriter, schm=None) which would assume 'append' if schema is not passed. Documentation can clearly mention that. This will avoid to have extra factory methods.

          Can't due to some fun circular importing.

          io.py should not be using generic.py as it is supposed to be other way round. I think we would have to move DataFileWriter and DataFileReader to different package say datafile.py on the same lines as java which has it in different package.

          Show
          Sharad Agarwal added a comment - You should add pydoc to init, create_new, and create_for_append, since it's not clear which one one is supposed to use. Other alternative could be to use _ init _(self, writer, dwriter, schm=None) which would assume 'append' if schema is not passed. Documentation can clearly mention that. This will avoid to have extra factory methods. Can't due to some fun circular importing. io.py should not be using generic.py as it is supposed to be other way round. I think we would have to move DataFileWriter and DataFileReader to different package say datafile.py on the same lines as java which has it in different package.
          Jeff Hammerbacher made changes -
          Link This issue is blocked by AVRO-201 [ AVRO-201 ]
          Hide
          Jeff Hammerbacher added a comment -

          I think we would have to move DataFileWriter and DataFileReader to different package say datafile.py on the same lines as java which has it in different package.

          Good idea, Sharad. I've made the change over at AVRO-201. If you review and commit that change, I'll clean up this patch and get it ready for submission using the new "datafile" module.

          Show
          Jeff Hammerbacher added a comment - I think we would have to move DataFileWriter and DataFileReader to different package say datafile.py on the same lines as java which has it in different package. Good idea, Sharad. I've made the change over at AVRO-201 . If you review and commit that change, I'll clean up this patch and get it ready for submission using the new "datafile" module.
          Jeff Hammerbacher made changes -
          Assignee Jeff Hammerbacher [ hammer ]
          Hide
          Jeff Hammerbacher added a comment -

          Fixed in AVRO-219

          Show
          Jeff Hammerbacher added a comment - Fixed in AVRO-219
          Jeff Hammerbacher made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Duplicate [ 3 ]
          Doug Cutting made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          63d 15h 47m 1 Jeff Hammerbacher 23/Dec/09 13:14
          Resolved Resolved Closed Closed
          68d 8h 32m 1 Doug Cutting 01/Mar/10 21:47

            People

            • Assignee:
              Jeff Hammerbacher
              Reporter:
              Jeff Hammerbacher
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development