Commons CSV
  1. Commons CSV
  2. CSV-54

Confusing semantic of the ignore leading/trailing spaces parameters

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0
    • Component/s: Parser
    • Labels:
      None

      Description

      CSVFormat has two parameters to control how the leading and trailing spaces around values are handled, but the actual behavior depends on the value being enclosed in quotes or not.

      If the value is not enclosed in quotes, setting leading/trailingSpacesIgnored to true will left or right trim the value. For example with this input (using the default format):

      a,  b  ,c

      the second value will be equal to 'b'.

      But if the value is enclosed into quotes, the value is no longer trimmed:

      a," b ",c

      this will give ' b '.

      With quoted values the parser actually ignores the spaces between the delimiter and the quote. Thus with this input:

      a, " b " ,c

      The value returned is ' b '.

      If leading/trailingSpacesIgnored is set to false, we get instead ' " b " ' which is consistent with RFC 4180.

        Activity

        Hide
        Emmanuel Bourg added a comment -

        I suggest replacing leading/trailingSpacesIgnored with two parameters:

        • interleavedSpacesIgnored: this will ignore the spaces between the delimiter and the opening quote, and between the closing quote and the next delimiter.
        • trimmedSpaces: this will remove the spaces around the values, on the left and on the right. I don't see the need to trim only on one side and not the other.
        Show
        Emmanuel Bourg added a comment - I suggest replacing leading/trailingSpacesIgnored with two parameters: interleavedSpacesIgnored : this will ignore the spaces between the delimiter and the opening quote, and between the closing quote and the next delimiter. trimmedSpaces : this will remove the spaces around the values, on the left and on the right. I don't see the need to trim only on one side and not the other.
        Hide
        Sebb added a comment -

        I'm not sure I would expect any space removal to ever occur within quoted values.
        Surely that's one of the reasons why values are quoted - to prevent removal of spaces.

        Show
        Sebb added a comment - I'm not sure I would expect any space removal to ever occur within quoted values. Surely that's one of the reasons why values are quoted - to prevent removal of spaces.
        Hide
        Emmanuel Bourg added a comment -

        The reason for enclosing the values into quotes is to put a delimiter or a line separator in the value. Spaces are always part of the value, quoted or not. At least that's how it's specified in RFC 4180.

        Show
        Emmanuel Bourg added a comment - The reason for enclosing the values into quotes is to put a delimiter or a line separator in the value. Spaces are always part of the value, quoted or not. At least that's how it's specified in RFC 4180.
        Hide
        Sebb added a comment -

        Definitely better to drop the separate leading/trailing space options.

        However, I think "interleavedSpacesIgnored" should apply for both quoted and unquoted values.
        As far as I can tell, that is the expectation in the CSV format specs I've seen.

        The "trimmedSpaces" setting would then be identical to "interleavedSpacesIgnored" for unquoted values.
        For quoted values it would also trim the enclosed value; not sure that's particularly useful to provide as part of CSV.
        It's easy enough for the application to trim the fields, so I'm not sure the setting is necessary.
        Seems like it is straying away from the basic purpose of CSV.

        However, if/when CSV is updated support creating Java Beans, value trimming would be much more appropriate there, as it would be on a per-column basis.

        Let's keep the initial parsing simple.

        Show
        Sebb added a comment - Definitely better to drop the separate leading/trailing space options. However, I think "interleavedSpacesIgnored" should apply for both quoted and unquoted values. As far as I can tell, that is the expectation in the CSV format specs I've seen. The "trimmedSpaces" setting would then be identical to "interleavedSpacesIgnored" for unquoted values. For quoted values it would also trim the enclosed value; not sure that's particularly useful to provide as part of CSV. It's easy enough for the application to trim the fields, so I'm not sure the setting is necessary. Seems like it is straying away from the basic purpose of CSV. However, if/when CSV is updated support creating Java Beans, value trimming would be much more appropriate there, as it would be on a per-column basis. Let's keep the initial parsing simple.

          People

          • Assignee:
            Unassigned
            Reporter:
            Emmanuel Bourg
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development