Commons CSV
  1. Commons CSV
  2. CSV-68

Use the Builder pattern for CSVFormat

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0
    • Component/s: None
    • Labels:
      None

      Description

      Using a builder pattern to create CSVFormat instances would allow the settings to be validated at creation time and would eliminate the need to keep creating new CSVFormat instances whilst still allowing the class to be immutable.

      A possible API is as follows:

      CSVFormat DEFAULT = CSVFormat.init(',') // delimiter is required
              .withEncapsulator('"')
              .withLeadingSpacesIgnored(true)
              .withTrailingSpacesIgnored(true)
              .withEmptyLinesIgnored(true)
              .withLineSeparator("\r\n") // optional, as it would be the default
              .build();
      
      CSVFormat format = CSVFormat.init(CSVFormat.DEFAULT) // alternatively start with pre-defined format
              .withSurroundingSpacesIgnored(false)
              .build();
      

      Compare this with the current syntax:

      // internal syntax; not easy to determine what all the parameters do
      CSVFormat DEFAULT1 = new CSVFormat(',', '"', DISABLED, DISABLED, true, true, false, true, CRLF);
      
      // external syntax
      CSVFormat format = CSVFormat.DEFAULT.withSurroundingSpacesIgnored(false);
      

      As a proof of concept I've written skeleton code which compiles (but needs completing).

      1. CVSFormat2Main.java
        0.9 kB
        Sebb
      2. CSVFormat2.java
        14 kB
        Sebb
      3. CSV-68.patch
        3 kB
        Sebb
      4. CSV-68_20121117.patch
        12 kB
        Benedikt Ritter
      5. CSV-68_20121115.patch
        4 kB
        Benedikt Ritter
      6. CSV-68_20121114.patch
        57 kB
        Benedikt Ritter
      7. CSV-68_20121111.patch
        56 kB
        Benedikt Ritter

        Issue Links

          Activity

          Hide
          Gary Gregory added a comment -

          "I did't add brackets in all if branches, to keep the code short. This may cause checkstyle errors."

          I changed the patch to match the component's format

          "The header array is now copied in the CSVFormat constructor."

          I simplified the code by cloning the array.

          "How ever, the reference to the header array can still escape the CSVFormatBuilder. This may cause issues if CSVFormatBuilder instances are shared between multiple threads. Do you think this is an issue that has to be addressed?"

          Well... the CSVFormat class claims to be immutable, which it is thanks to the header array fix.

          CSVFormatBuilder is not immutable by its nature. Do you foresee a case where the header passed to a builder who be modified by before the format is built? It would seem odd.

          Show
          Gary Gregory added a comment - "I did't add brackets in all if branches, to keep the code short. This may cause checkstyle errors." I changed the patch to match the component's format "The header array is now copied in the CSVFormat constructor." I simplified the code by cloning the array. "How ever, the reference to the header array can still escape the CSVFormatBuilder. This may cause issues if CSVFormatBuilder instances are shared between multiple threads. Do you think this is an issue that has to be addressed?" Well... the CSVFormat class claims to be immutable, which it is thanks to the header array fix. CSVFormatBuilder is not immutable by its nature. Do you foresee a case where the header passed to a builder who be modified by before the format is built? It would seem odd.
          Hide
          Gary Gregory added a comment -

          Patch tweaked and applied (https://issues.apache.org/jira/secure/attachment/12553894/CSV-68_20121117.patch). Thank you!

          commit -m "[CSV-68] Use the Builder pattern for CSVFormat." C:/svn/org/apache/commons/trunks-proper/csv/src/test/java/org/apache/commons/csv/CSVFormatTest.java C:/svn/org/apache/commons/trunks-proper/csv/src/test/java/org/apache/commons/csv/CSVFormatBuilderTest.java C:/svn/org/apache/commons/trunks-proper/csv/src/main/java/org/apache/commons/csv/CSVFormat.java
              Sending        C:/svn/org/apache/commons/trunks-proper/csv/src/main/java/org/apache/commons/csv/CSVFormat.java
              Sending        C:/svn/org/apache/commons/trunks-proper/csv/src/test/java/org/apache/commons/csv/CSVFormatBuilderTest.java
              Sending        C:/svn/org/apache/commons/trunks-proper/csv/src/test/java/org/apache/commons/csv/CSVFormatTest.java
              Transmitting file data ...
              Committed revision 1410759.
          
          Show
          Gary Gregory added a comment - Patch tweaked and applied ( https://issues.apache.org/jira/secure/attachment/12553894/CSV-68_20121117.patch ). Thank you! commit -m "[CSV-68] Use the Builder pattern for CSVFormat." C:/svn/org/apache/commons/trunks-proper/csv/src/test/java/org/apache/commons/csv/CSVFormatTest.java C:/svn/org/apache/commons/trunks-proper/csv/src/test/java/org/apache/commons/csv/CSVFormatBuilderTest.java C:/svn/org/apache/commons/trunks-proper/csv/src/main/java/org/apache/commons/csv/CSVFormat.java Sending C:/svn/org/apache/commons/trunks-proper/csv/src/main/java/org/apache/commons/csv/CSVFormat.java Sending C:/svn/org/apache/commons/trunks-proper/csv/src/test/java/org/apache/commons/csv/CSVFormatBuilderTest.java Sending C:/svn/org/apache/commons/trunks-proper/csv/src/test/java/org/apache/commons/csv/CSVFormatTest.java Transmitting file data ... Committed revision 1410759.
          Hide
          Benedikt Ritter added a comment -

          Looking at the patch again, it would probably to use Arrays.copy(T[], int) in favor of System.arraycopy. I'll do that after I have feedback for the overall changes included in the patch.

          Show
          Benedikt Ritter added a comment - Looking at the patch again, it would probably to use Arrays.copy(T[], int) in favor of System.arraycopy. I'll do that after I have feedback for the overall changes included in the patch.
          Hide
          Benedikt Ritter added a comment -

          Hi again,

          I have implemented equals and hashCode on CSVFormat, so that I could get rid of the assertEqualFormats method.
          I did't add brackets in all if branches, to keep the code short. This may cause checkstyle errors. Please comment how you feel about that.

          I also addressed the issue with the header array reference. The header array is now copied in the CSVFormat constructor. How ever, the reference to the header array can still escape the CSVFormatBuilder. This may cause issues if CSVFormatBuilder instances are shared between multiple threads. Do you think this is an issue that has to be addressed?

          TIA for review,
          Benedikt

          Show
          Benedikt Ritter added a comment - Hi again, I have implemented equals and hashCode on CSVFormat, so that I could get rid of the assertEqualFormats method. I did't add brackets in all if branches, to keep the code short. This may cause checkstyle errors. Please comment how you feel about that. I also addressed the issue with the header array reference. The header array is now copied in the CSVFormat constructor. How ever, the reference to the header array can still escape the CSVFormatBuilder. This may cause issues if CSVFormatBuilder instances are shared between multiple threads. Do you think this is an issue that has to be addressed? TIA for review, Benedikt
          Hide
          Gary Gregory added a comment -

          Patch https://issues.apache.org/jira/secure/attachment/12553698/CSV-68_20121115.patch applied, thank you!

          • If you implement equals(), don't forget a hashCode() to match.
          • Since we claim the format class to be immutable, then we MUST make a copy of the header array.

          Keep up the good work!

          Thank you,
          Gary

          Show
          Gary Gregory added a comment - Patch https://issues.apache.org/jira/secure/attachment/12553698/CSV-68_20121115.patch applied, thank you! If you implement equals(), don't forget a hashCode() to match. Since we claim the format class to be immutable, then we MUST make a copy of the header array. Keep up the good work! Thank you, Gary
          Hide
          Benedikt Ritter added a comment -

          The same applies to all the other non primitive parameters.

          Since Character and String instances are immutable, this certainly only applies to the header array.

          Show
          Benedikt Ritter added a comment - The same applies to all the other non primitive parameters. Since Character and String instances are immutable, this certainly only applies to the header array.
          Hide
          Benedikt Ritter added a comment -

          I have added a method that lets you initialize a CSVFormatBuilder with the values of a given CSVFormat. I also included two tests methods. There is a FIXME left in CSVFormatBuilderTest, because CSVFormat does not implement equals. After equals is implemented Assert.assertEquals(Object, Object) can be used to validate that the copied CSVFormat is equal to the one passed in.

          Now I wonder if it is a good idea to simply pass format.header to the CSVFormatBuilder in 441. This makes it possible to let a reference to the header array escape to the calling code. Maybe we have to create copy of the array? The same applies to all the other non primitive parameters.

          Best regards,
          Benedikt

          Show
          Benedikt Ritter added a comment - I have added a method that lets you initialize a CSVFormatBuilder with the values of a given CSVFormat. I also included two tests methods. There is a FIXME left in CSVFormatBuilderTest, because CSVFormat does not implement equals. After equals is implemented Assert.assertEquals(Object, Object) can be used to validate that the copied CSVFormat is equal to the one passed in. Now I wonder if it is a good idea to simply pass format.header to the CSVFormatBuilder in 441. This makes it possible to let a reference to the header array escape to the calling code. Maybe we have to create copy of the array? The same applies to all the other non primitive parameters. Best regards, Benedikt
          Hide
          Gary Gregory added a comment -

          Yes, that seems good.

          I committed https://issues.apache.org/jira/secure/attachment/12553560/CSV-68_20121114.patch so you can pick up trunk and keep on going. I then renamed the methods that create a builder newBuilder().

          Show
          Gary Gregory added a comment - Yes, that seems good. I committed https://issues.apache.org/jira/secure/attachment/12553560/CSV-68_20121114.patch so you can pick up trunk and keep on going. I then renamed the methods that create a builder newBuilder().
          Hide
          Benedikt Ritter added a comment -

          Hi Gary,

          being critical is a good thing, so no worries!

          I've created another patch that replaces pristine() with defaults(). So there is no need for DISABLED anymore.

          Now on thing is still missing: It is not possible to create a new format using the values of another. So another factory method is needed that takes a format as an argument and initializes the builder with the formats values.

          WDYT?
          Benedikt

          Show
          Benedikt Ritter added a comment - Hi Gary, being critical is a good thing, so no worries! I've created another patch that replaces pristine() with defaults(). So there is no need for DISABLED anymore. Now on thing is still missing: It is not possible to create a new format using the values of another. So another factory method is needed that takes a format as an argument and initializes the builder with the formats values. WDYT? Benedikt
          Hide
          Gary Gregory added a comment -

          Hello Benedikt,

          I'm playing a bit of the devil's advocate here, so please bear with me.

          1. Why are we still using CSVFormat.DISABLED? It feels left over from the pre-patch impl.
          2. Why not get rid of pristine() and use defaults() instead? That automatically addresses (1).

          YW and thank you for the patch
          Gary

          Show
          Gary Gregory added a comment - Hello Benedikt, I'm playing a bit of the devil's advocate here, so please bear with me. Why are we still using CSVFormat.DISABLED? It feels left over from the pre-patch impl. Why not get rid of pristine() and use defaults() instead? That automatically addresses (1). YW and thank you for the patch Gary
          Hide
          Benedikt Ritter added a comment -

          Hi Gary,

          the problem is, that the user has no possibility to make sure that a CSVFormat is valid. Internally we use the package private validate() method for this purpose. Using a builder makes sure that only valid formats can be created and there is no need to validate a format after construction.
          Another advantage of the builder is, that CSVFormat's interface is smaller and does only contain methods that are needs when working with CSVFormats. OTH you can not simply call one of the "withXXX" methods to create a copy of a format.

          Regards and thanks for review,
          Benedikt

          Show
          Benedikt Ritter added a comment - Hi Gary, the problem is, that the user has no possibility to make sure that a CSVFormat is valid. Internally we use the package private validate() method for this purpose. Using a builder makes sure that only valid formats can be created and there is no need to validate a format after construction. Another advantage of the builder is, that CSVFormat's interface is smaller and does only contain methods that are needs when working with CSVFormats. OTH you can not simply call one of the "withXXX" methods to create a copy of a format. Regards and thanks for review, Benedikt
          Hide
          Gary Gregory added a comment -

          Hm... so why is this:

          CSVFormat format = CSVFormat.defaults().withIgnoreSurroundingSpaces(false).build();
          

          better than:

          CSVFormat format = CSVFormat.DEFAULT.withIgnoreSurroundingSpaces(false);
          

          ?

          From a user's POV, I see defaults() vs DEFAULT, I'm indifferent, but now I have to call an extra method, build(). So the patch makes configuration MORE verbose. That does not sound right.

          Does it to anyone else?

          G

          Show
          Gary Gregory added a comment - Hm... so why is this: CSVFormat format = CSVFormat.defaults().withIgnoreSurroundingSpaces( false ).build(); better than: CSVFormat format = CSVFormat.DEFAULT.withIgnoreSurroundingSpaces( false ); ? From a user's POV, I see defaults() vs DEFAULT, I'm indifferent, but now I have to call an extra method, build(). So the patch makes configuration MORE verbose. That does not sound right. Does it to anyone else? G
          Hide
          Benedikt Ritter added a comment -

          It has been a while since this has been discussed and the API has changed a bit since then. I implement the builder not knowing, that sebb already created a patch for this (better browse JIRA before coding next time). Since my patch is more up to date, I'm uploading it anyway to get the discussion started again.

          Regards,
          Benedikt

          Show
          Benedikt Ritter added a comment - It has been a while since this has been discussed and the API has changed a bit since then. I implement the builder not knowing, that sebb already created a patch for this (better browse JIRA before coding next time). Since my patch is more up to date, I'm uploading it anyway to get the discussion started again. Regards, Benedikt
          Hide
          Sebb added a comment -

          I've just discovered another disadvantage of the current code.

          Removal of unicode escaping required changes to every single constructor call in CSVFormat, of which there are a lot.
          It was also necessary to be very careful to remove the correct parameter in the predefined formats and test cases.

          Similarly when adding a new parameter, a lot of code has to be changed.

          With the proposed builder pattern, the changes are much simpler, because there is only one constructor call to change (and its parameters are named, so it's obvious where it should be put).

          Show
          Sebb added a comment - I've just discovered another disadvantage of the current code. Removal of unicode escaping required changes to every single constructor call in CSVFormat, of which there are a lot. It was also necessary to be very careful to remove the correct parameter in the predefined formats and test cases. Similarly when adding a new parameter, a lot of code has to be changed. With the proposed builder pattern, the changes are much simpler, because there is only one constructor call to change (and its parameters are named, so it's obvious where it should be put).
          Hide
          Emmanuel Bourg added a comment -

          I suggest to remove the validation completely. No other CSV API do that, and the protection it provides is useless. Then the concept of an "invalid" format disappears and there is no need to create a builder.

          Show
          Emmanuel Bourg added a comment - I suggest to remove the validation completely. No other CSV API do that, and the protection it provides is useless. Then the concept of an "invalid" format disappears and there is no need to create a builder.
          Hide
          Sebb added a comment -

          OK, let's use build, which is only two characters longer. Is it really that important to save 2 characters?

          ==

          The current validate method cannot be called in the CSVFormat ctor, because the fluent chain may create an invalid instance part way through.

          Therefore the format method has to be called outside the class; it's very easy for the call to be omitted when the code is later maintained.

          It's not just so that the validate can be invoked earlier; it's to guarantee that an invalid format cannot be created.

          Show
          Sebb added a comment - OK, let's use build, which is only two characters longer. Is it really that important to save 2 characters? == The current validate method cannot be called in the CSVFormat ctor, because the fluent chain may create an invalid instance part way through. Therefore the format method has to be called outside the class; it's very easy for the call to be omitted when the code is later maintained. It's not just so that the validate can be invoked earlier; it's to guarantee that an invalid format cannot be created.
          Hide
          Emmanuel Bourg added a comment -

          Ok it's shorter but it doesn't make sense, what is the meaning of a "go" action on a format?

          Also now every withXXX() method is duplicated in the code.

          I find this far fetched just to ensure that in the unlikely case of someone using the same character for the delimiter, encapsulator or escape, a warning will be thrown ONE line earlier.

          Show
          Emmanuel Bourg added a comment - Ok it's shorter but it doesn't make sense, what is the meaning of a "go" action on a format? Also now every withXXX() method is duplicated in the code. I find this far fetched just to ensure that in the unlikely case of someone using the same character for the delimiter, encapsulator or escape, a warning will be thrown ONE line earlier.
          Hide
          Sebb added a comment -

          Sample builder and application that uses it

          Show
          Sebb added a comment - Sample builder and application that uses it
          Hide
          Sebb added a comment -

          How about:

          CSVFormat.withDelimiter(';').withEncapsulator('"').go();
          

          It's one character shorter than your example.

          Show
          Sebb added a comment - How about: CSVFormat.withDelimiter(';').withEncapsulator('"').go(); It's one character shorter than your example.
          Hide
          Emmanuel Bourg added a comment -

          This one would satisfy me:

          new CSVFormat().withDelimiter(';').withEncapsulator('"');
          
          Show
          Emmanuel Bourg added a comment - This one would satisfy me: new CSVFormat().withDelimiter(';').withEncapsulator('"');
          Hide
          Sebb added a comment -

          how about renaming init(char) to withDelimiter(char)

          The delimiter is different, in that it is the only parameter that must be specified.

          But one could still potentially have a withDelimiter() method that could be applied to an existing Builder chain, e.g. to modify an existing format. [The builder delim field is final. I might well change that and add the method.]

          Note that the first and last methods in the chain have different input/output classes, so should use a different naming convention.

          "CSVFormat.init()" could be replaced by "new CSVFormat.Builder()" but I thought it was slightly neater to have a static "newInstance()" method, and given that the delimiter needs to be defined, I thought it might as well be a parameter to the init() method.

          The init() method can of course be renamed.
          It cannot be renamed to withDelimiter() unless the existing method of the same name is dropped; but even without the name clash I think it would be a mistake to use a name prefixed "with" as the "newInstance" method as users would expect to be able to provide the parameters in any order.

          Just realised that this would also be possible if each withMethod were defined as a static method in CSVFormat as well as a class method in the Builder. [Only the build() method would be unique to the Builder.]

          The only extra method call required would then be the build() method.

          So the syntax would be

          CSVFormat.withDelimiter(',').withEncapsulator('"')...build();
          CSVFormat.excel().withDelimiter(';').build();
          

          Maybe that would satisfy everyone?

          Show
          Sebb added a comment - how about renaming init(char) to withDelimiter(char) The delimiter is different, in that it is the only parameter that must be specified. But one could still potentially have a withDelimiter() method that could be applied to an existing Builder chain, e.g. to modify an existing format. [The builder delim field is final. I might well change that and add the method.] Note that the first and last methods in the chain have different input/output classes, so should use a different naming convention. "CSVFormat.init()" could be replaced by "new CSVFormat.Builder()" but I thought it was slightly neater to have a static "newInstance()" method, and given that the delimiter needs to be defined, I thought it might as well be a parameter to the init() method. The init() method can of course be renamed. It cannot be renamed to withDelimiter() unless the existing method of the same name is dropped; but even without the name clash I think it would be a mistake to use a name prefixed "with" as the "newInstance" method as users would expect to be able to provide the parameters in any order. Just realised that this would also be possible if each withMethod were defined as a static method in CSVFormat as well as a class method in the Builder. [Only the build() method would be unique to the Builder.] The only extra method call required would then be the build() method. So the syntax would be CSVFormat.withDelimiter(',').withEncapsulator('"')...build(); CSVFormat.excel().withDelimiter(';').build(); Maybe that would satisfy everyone?
          Hide
          Simone Tripodi added a comment -

          FWIW, +1 to builder and fluent APIs

          Show
          Simone Tripodi added a comment - FWIW, +1 to builder and fluent APIs
          Hide
          Benedikt Ritter added a comment -

          Hey,

          how about renaming init(char) to withDelimiter(char) and then proving methods like:

          public static Builder rfc4180() // (I never understood why it is called default...)
          public static Builder excel()
          public static Builder tdf()
          

          Benedikt

          Show
          Benedikt Ritter added a comment - Hey, how about renaming init(char) to withDelimiter(char) and then proving methods like: public static Builder rfc4180() // (I never understood why it is called default ...) public static Builder excel() public static Builder tdf() Benedikt
          Hide
          Sebb added a comment -

          The extra verbosity is .init(',') and .build()
          Not a lot.

          The gains are:

          • much clearer setup of built-in formats
          • guaranteed validation
          Show
          Sebb added a comment - The extra verbosity is .init(',') and .build() Not a lot. The gains are: much clearer setup of built-in formats guaranteed validation
          Hide
          Emmanuel Bourg added a comment -

          As stated on the list I'm -1 on changing CSVFormat to this form, it's more verbose for no substantial gain. The delayed validation is not an issue.

          Show
          Emmanuel Bourg added a comment - As stated on the list I'm -1 on changing CSVFormat to this form, it's more verbose for no substantial gain. The delayed validation is not an issue.
          Hide
          Sebb added a comment - - edited

          Patch with skeleton implementation.

          The build() method should be updated to include validation.

          Show
          Sebb added a comment - - edited Patch with skeleton implementation. The build() method should be updated to include validation.

            People

            • Assignee:
              Unassigned
              Reporter:
              Sebb
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development