Avro
  1. Avro
  2. AVRO-158

permit data file appends from java

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Component/s: java
    • Labels:
      None

      Description

      Avro file object container are mutable and we should test that fact.

      1. AVRO-158.patch
        9 kB
        Doug Cutting
      2. AVRO-158.patch
        9 kB
        Doug Cutting

        Issue Links

          Activity

          Hide
          Doug Cutting added a comment -

          I just committed this.

          Show
          Doug Cutting added a comment - I just committed this.
          Hide
          Philip Zeyliger added a comment -

          I'm +1 the current patch.

          Some responses to the comments:

          Do you have alternate wording?

          No, it's fine. It just seemed unrelated, so I was wondering what was going on.

          Factory methods make it hard to subclass.

          Do you expect DataFileWriter to be subclassed? It's not final, so presumably the answer is "yes". I'm fine with the current implementation, but something about DataFileWriter(File, DatumWriter) being so different than DataFileWriter(Schema, OutputStream, DatumWriter) strikes me as odd.

          Show
          Philip Zeyliger added a comment - I'm +1 the current patch. Some responses to the comments: Do you have alternate wording? No, it's fine. It just seemed unrelated, so I was wondering what was going on. Factory methods make it hard to subclass. Do you expect DataFileWriter to be subclassed? It's not final, so presumably the answer is "yes". I'm fine with the current implementation, but something about DataFileWriter(File, DatumWriter) being so different than DataFileWriter(Schema, OutputStream, DatumWriter) strikes me as odd.
          Hide
          Doug Cutting added a comment -

          > You removed "according to its metadata" from this comment.

          Do you have alternate wording? Arraylist#size() just says, "Returns the number of elements in this list." The arraylist class-level document does define order-of-algorithms bounds for different operations, including size(). So I suppose we could say, "Runs in constant time", although we don't bother giving the order of algorithms for most other methods here. Personally I think it's overkill. We should perhaps warn folks away from things that might be unexpectedly slow, but I don't see a need to tell them that something that they should expect to be fast is in fact fast.

          > I'd prefer DataFileWriter.createForAppend(File, DatumWriter) as a factory method.

          Factory methods make it hard to subclass. Probably we should make opening the writer a separate method(s) from the constructor, but that should be done in a separate issue, and could be done for both readers and writers and would be an incompatible change. If you feel strongly about this, please file that issue.

          > Perhaps better to catch NoSuchAlgorithmException explicitly?

          Done. Good catch.

          > In Junit3, setUp() and tearDown() were used for this.

          I think @Before and @After methods are what's replaced these. But they unfortunately are run before and after each @Test, so are a poor means to express dependencies. In practice, JUnit runs tests in the order they are defined. The file to which this test is added already depends on this, as do many others, so I don't see addressing that as relevant to the current issue.

          Show
          Doug Cutting added a comment - > You removed "according to its metadata" from this comment. Do you have alternate wording? Arraylist#size() just says, "Returns the number of elements in this list." The arraylist class-level document does define order-of-algorithms bounds for different operations, including size(). So I suppose we could say, "Runs in constant time", although we don't bother giving the order of algorithms for most other methods here. Personally I think it's overkill. We should perhaps warn folks away from things that might be unexpectedly slow, but I don't see a need to tell them that something that they should expect to be fast is in fact fast. > I'd prefer DataFileWriter.createForAppend(File, DatumWriter) as a factory method. Factory methods make it hard to subclass. Probably we should make opening the writer a separate method(s) from the constructor, but that should be done in a separate issue, and could be done for both readers and writers and would be an incompatible change. If you feel strongly about this, please file that issue. > Perhaps better to catch NoSuchAlgorithmException explicitly? Done. Good catch. > In Junit3, setUp() and tearDown() were used for this. I think @Before and @After methods are what's replaced these. But they unfortunately are run before and after each @Test, so are a poor means to express dependencies. In practice, JUnit runs tests in the order they are defined. The file to which this test is added already depends on this, as do many others, so I don't see addressing that as relevant to the current issue.
          Hide
          Philip Zeyliger added a comment -

          Overall, looks good.

          Some comments:

          "Return the number of records in the file."

          You removed "according to its metadata" from this comment. I'm not tied to the exact wording, but when I ran into this method for the first time, I assumed that it was going through the entire file and counting records, and thought, "boy, that's a bad idea". Exposing a bit of information about how it works, or what it's runtime is would be helpful.

          public DataFileWriter(File file, DatumWriter<D> dout)

          I don't mind convenience methods for File, but the fact that DataFileWriter(File, DatumWriter) is so very different from DataFileWriter(Schema, File, DatumWriter) bugs me a bit. I'd prefer DataFileWriter.createForAppend(File, DatumWriter) as a factory method. Isn't it possible to append with a different schema, btw? How does that work?

          To answer the question you posed, I think a factory method with an enum, or just several factory methods.

          if (!file.exists())

          The Sun style guidelines prefer braces around if statements.

          } catch (Exception e) {

          If the code happens to throw a RuntimeException (OOM, say), it will get re-wrapped in another RuntimeException, which seems unnecessary. Perhaps better to catch NoSuchAlgorithmException explicitly?

          testGenericAppend

          testGenericAppend looks like it depends on testGenericWrite executing first. I'm not that familiar with this version of JUnit, but is it a goal that the individual tests run standalone? In Junit3, setUp() and tearDown() were used for this.

          Show
          Philip Zeyliger added a comment - Overall, looks good. Some comments: "Return the number of records in the file." You removed "according to its metadata" from this comment. I'm not tied to the exact wording, but when I ran into this method for the first time, I assumed that it was going through the entire file and counting records, and thought, "boy, that's a bad idea". Exposing a bit of information about how it works, or what it's runtime is would be helpful. public DataFileWriter(File file, DatumWriter<D> dout) I don't mind convenience methods for File, but the fact that DataFileWriter(File, DatumWriter) is so very different from DataFileWriter(Schema, File, DatumWriter) bugs me a bit. I'd prefer DataFileWriter.createForAppend(File, DatumWriter) as a factory method. Isn't it possible to append with a different schema, btw? How does that work? To answer the question you posed, I think a factory method with an enum, or just several factory methods. if (!file.exists()) The Sun style guidelines prefer braces around if statements. } catch (Exception e) { If the code happens to throw a RuntimeException (OOM, say), it will get re-wrapped in another RuntimeException, which seems unnecessary. Perhaps better to catch NoSuchAlgorithmException explicitly? testGenericAppend testGenericAppend looks like it depends on testGenericWrite executing first. I'm not that familiar with this version of JUnit, but is it a goal that the individual tests run standalone? In Junit3, setUp() and tearDown() were used for this.
          Hide
          Doug Cutting added a comment -

          This patch adds a new DataFileWriter constructor that opens an existing data file. Would it be better to have a single contructor (or perhaps factory method) that accepts an "overwrite" flag? This would, when overwrite=false, if a file exists, open it for append, otherwise create a new one. If overwrite=true, it would always create a new file. Which is the most useful API for applications?

          Show
          Doug Cutting added a comment - This patch adds a new DataFileWriter constructor that opens an existing data file. Would it be better to have a single contructor (or perhaps factory method) that accepts an "overwrite" flag? This would, when overwrite=false, if a file exists, open it for append, otherwise create a new one. If overwrite=true, it would always create a new file. Which is the most useful API for applications?
          Hide
          Doug Cutting added a comment -

          Here's a patch that implements this.

          Show
          Doug Cutting added a comment - Here's a patch that implements this.

            People

            • Assignee:
              Doug Cutting
              Reporter:
              Jeff Hammerbacher
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development