Details

    • Type: Improvement Improvement
    • Status: Reopened
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.0
    • Fix Version/s: 1.0
    • Component/s: Documentation, Parser, Printer
    • Labels:
      None

      Description

      Requirement

      To use the CSV parser and printer for SQL Dumps it would be nice if they could handle null values.

      Specification

      To distinguish between an empty or null value empty values always gets quoted to denote an empty string. The absence of an quote denotes a null value

      Configuration

      To activate the behavior call the method withNullObjectPatternEnabled of the CSVFormat with parameter true.

      Modifications

      See attached patch.

      Example

      This example using as base the DEFAULT CSVFormat modified by the NullObjectPattern behavior.

      Array-Data CSV-Data
      {null,"","A"," "}; ,"A",""," "
      {"",null,"A"," "} "",,"A"," "
      {"","A",null} "","A",
      NULL in DBMS proprietary CSV formats
      Product Strategy Documentation / Link
      PostgreSQL If NULL should be preserved all non NULL values gets quoted PostgreSQL 8.1 Manual
      MySQL NULL Values will be replaced by the letters NULL or escaped by \n not found, verified with MySQL Workbench
      MS SQL NULL values will be exported as empty strings (no quotes). Strings will be quoted if needed. Import can interpret them as null MSDN
      Oracle NULL Values will be replaced by the letters NULL Manual
      1. patch.txt
        16 kB
        Georg Tsakumagos
      2. CSV-93.diff
        11 kB
        Gary Gregory

        Issue Links

          Activity

          Hide
          Sebb added a comment -

          The problem is, this may clash with situations where textual data does not have to be quoted.

          I agree it would be useful to be able to represent null; however it needs to be done in a way that is unambiguous. This may require adding an option, e.g. to force text to be quoted.

          Show
          Sebb added a comment - The problem is, this may clash with situations where textual data does not have to be quoted. I agree it would be useful to be able to represent null; however it needs to be done in a way that is unambiguous. This may require adding an option, e.g. to force text to be quoted.
          Hide
          Georg Tsakumagos added a comment - - edited

          I think it is not a problem because the attached patch activate this behavior only if the new CSVFormat option WithNullObjectPatternEnabled is set.
          The handling of null values is a special treatment which must be activated.
          It is not activated by default.

          Please review the attached patch and judge it again.

          Show
          Georg Tsakumagos added a comment - - edited I think it is not a problem because the attached patch activate this behavior only if the new CSVFormat option WithNullObjectPatternEnabled is set. The handling of null values is a special treatment which must be activated. It is not activated by default. Please review the attached patch and judge it again.
          Hide
          Sebb added a comment -

          Sorry, I misread the issue.

          However, I'm still not sure that's the best (or complete) solution.
          Do you have any examples of existing applications that generate CSV data which include nulls?
          For example, how do SQL database utilities handle nulls?

          Show
          Sebb added a comment - Sorry, I misread the issue. However, I'm still not sure that's the best (or complete) solution. Do you have any examples of existing applications that generate CSV data which include nulls? For example, how do SQL database utilities handle nulls?
          Hide
          Georg Tsakumagos added a comment -

          Here are the results of research.

          Show
          Georg Tsakumagos added a comment - Here are the results of research.
          Hide
          Gary Gregory added a comment -

          Thoughts on this impl?

          Show
          Gary Gregory added a comment - Thoughts on this impl?
          Hide
          Gary Gregory added a comment -

          I should have said: Thoughts on the impl I attached https://issues.apache.org/jira/secure/attachment/12589089/CSV-93.diff

          Show
          Gary Gregory added a comment - I should have said: Thoughts on the impl I attached https://issues.apache.org/jira/secure/attachment/12589089/CSV-93.diff
          Hide
          Sebb added a comment -

          There are some Javadoc TODOs which really need to be fixed.

          I find the names of the new NULL methods confusing enough already!

          Show
          Sebb added a comment - There are some Javadoc TODOs which really need to be fixed. I find the names of the new NULL methods confusing enough already!
          Hide
          Gary Gregory added a comment -

          Good points.

          I am replacing my patch with a new version.

          The property name is now 'nullInputString' and after that I'd rename the property 'nullToString' to 'nullOutputString'.

                  /**
                   * Converts the given String to {@code null} when reading records.
                   * 
                   * @param nullInputString
                   *            the String to convert to {@code null} when reading records. No substitution occurs if {@code null}
                   * 
                   * @return This builder with the the specified input record separator
                   */
                  public CSVFormatBuilder withNullInputString(final String nullInputString) {
                      this.nullInputString = nullInputString;
                      return this;
                  }
          
              /**
               * Gets the String to convert to {@code null} when reading records.
               * 
               * @return the String to convert to {@code null} when reading records. No substitution occurs if {@code null}
               */
              public String getNullInputString() {
                  return nullInputString;
              }
          
          Show
          Gary Gregory added a comment - Good points. I am replacing my patch with a new version. The property name is now 'nullInputString' and after that I'd rename the property 'nullToString' to 'nullOutputString'. /** * Converts the given String to {@code null } when reading records. * * @param nullInputString * the String to convert to {@code null } when reading records. No substitution occurs if {@code null } * * @ return This builder with the the specified input record separator */ public CSVFormatBuilder withNullInputString( final String nullInputString) { this .nullInputString = nullInputString; return this ; } /** * Gets the String to convert to {@code null } when reading records. * * @ return the String to convert to {@code null } when reading records. No substitution occurs if {@code null } */ public String getNullInputString() { return nullInputString; }
          Hide
          Gary Gregory added a comment -

          Hm... or maybe there should just be one property used for both input and output. You just have to use two format objects if you want different input and output settings. That seems cleaner. The property would then be called 'nullString'.

          Show
          Gary Gregory added a comment - Hm... or maybe there should just be one property used for both input and output. You just have to use two format objects if you want different input and output settings. That seems cleaner. The property would then be called 'nullString'.
          Hide
          Sebb added a comment -

          Agreed, less is more in this case. +1 to nullString, and withNullString()

          Apart from anything else, it's hard to read nullInput - in some fonts the I looks like an l!

          Show
          Sebb added a comment - Agreed, less is more in this case. +1 to nullString, and withNullString() Apart from anything else, it's hard to read nullInput - in some fonts the I looks like an l!
          Hide
          Gary Gregory added a comment -
          commit -m "[CSV-93] Allow the handling of NULL values. Use a single property 'nullString' for both input and output processing. No substitutions occur if null. For reading, nullString is used to convert field values to null. For writing, nullString is used to output the given string instead of the empty string." C:/vcs/svn/apache/commons/trunks-proper/csv/src/main/java/org/apache/commons/csv/CSVParser.java C:/vcs/svn/apache/commons/trunks-proper/csv/src/main/java/org/apache/commons/csv/CSVPrinter.java C:/vcs/svn/apache/commons/trunks-proper/csv/src/main/java/org/apache/commons/csv/CSVFormat.java C:/vcs/svn/apache/commons/trunks-proper/csv/src/test/java/org/apache/commons/csv/CSVPrinterTest.java
              Sending        C:/vcs/svn/apache/commons/trunks-proper/csv/src/main/java/org/apache/commons/csv/CSVFormat.java
              Sending        C:/vcs/svn/apache/commons/trunks-proper/csv/src/main/java/org/apache/commons/csv/CSVParser.java
              Sending        C:/vcs/svn/apache/commons/trunks-proper/csv/src/main/java/org/apache/commons/csv/CSVPrinter.java
              Sending        C:/vcs/svn/apache/commons/trunks-proper/csv/src/test/java/org/apache/commons/csv/CSVPrinterTest.java
              Transmitting file data ...
              Committed revision 1495911.
          
          Show
          Gary Gregory added a comment - commit -m "[CSV-93] Allow the handling of NULL values. Use a single property 'nullString' for both input and output processing. No substitutions occur if null. For reading, nullString is used to convert field values to null. For writing, nullString is used to output the given string instead of the empty string." C:/vcs/svn/apache/commons/trunks-proper/csv/src/main/java/org/apache/commons/csv/CSVParser.java C:/vcs/svn/apache/commons/trunks-proper/csv/src/main/java/org/apache/commons/csv/CSVPrinter.java C:/vcs/svn/apache/commons/trunks-proper/csv/src/main/java/org/apache/commons/csv/CSVFormat.java C:/vcs/svn/apache/commons/trunks-proper/csv/src/test/java/org/apache/commons/csv/CSVPrinterTest.java Sending C:/vcs/svn/apache/commons/trunks-proper/csv/src/main/java/org/apache/commons/csv/CSVFormat.java Sending C:/vcs/svn/apache/commons/trunks-proper/csv/src/main/java/org/apache/commons/csv/CSVParser.java Sending C:/vcs/svn/apache/commons/trunks-proper/csv/src/main/java/org/apache/commons/csv/CSVPrinter.java Sending C:/vcs/svn/apache/commons/trunks-proper/csv/src/test/java/org/apache/commons/csv/CSVPrinterTest.java Transmitting file data ... Committed revision 1495911.
          Hide
          Georg Tsakumagos added a comment - - edited

          The generic strategy to map null values to a configurable string is very generic but leads to problems interpreting the data. The current solution is a simple substitution which everyone could easy implement in their specific glue layer. Nice to have but not really necessary.

          Thats why i prefer the quotes to signal an empty string. Replacing an null value with an constant string leads to big problems because you never could be sure that the chosen constant does not occur in the data.

          Maybe in the next run.
          Meanwhile i'll patch it locally.

          Show
          Georg Tsakumagos added a comment - - edited The generic strategy to map null values to a configurable string is very generic but leads to problems interpreting the data. The current solution is a simple substitution which everyone could easy implement in their specific glue layer. Nice to have but not really necessary. Thats why i prefer the quotes to signal an empty string. Replacing an null value with an constant string leads to big problems because you never could be sure that the chosen constant does not occur in the data. Maybe in the next run. Meanwhile i'll patch it locally.

            People

            • Assignee:
              Unassigned
              Reporter:
              Georg Tsakumagos
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development