Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0
    • Component/s: None
    • Labels:
      None

      Description

      Comment handling is not currently fully documented / tested.

      There are several possible situations:
      1) trailing comment following one or more values
      2) comment marker starts in the first column
      3) comment marker starts in the first non-whitespace column

      How should these be handled?
      1) The trailing comment should be ignored
      2) Entire line should be ignored, i.e. don't treat it as a blank line
      3) This is trickier: if whitespace is being trimmed, treat as 2, else treat as 1, i.e. there is a single value containing whitespace

      It might also be useful to consider returning comments to the application program as part of CSVRecord.

        Issue Links

          Activity

          Hide
          Sebb added a comment -

          At present (r1304503), the comment introducer is not recognised whilst processing a value.
          It is only recognised at the start of a value.

          This seems wrong, as it makes it impossible to add a trailing comment to the end of a record without adding a preceeding delimiter. This is not consistent, see below:

          # No comment following, 4 values found
          a,b,c,d<EOL>
          d,e,f,g# comment treated as part of 4th value<EOL>
          h,i,j,k,# this will be recognised as a comment following 4 fields<EOL>
          l,m,n,o,<EOL>
          # previous line should have 5 fields but actually gets merged to previous line.
          

          Comments currently only work correctly at the start of a record.

          Show
          Sebb added a comment - At present (r1304503), the comment introducer is not recognised whilst processing a value. It is only recognised at the start of a value. This seems wrong, as it makes it impossible to add a trailing comment to the end of a record without adding a preceeding delimiter. This is not consistent, see below: # No comment following, 4 values found a,b,c,d<EOL> d,e,f,g# comment treated as part of 4th value<EOL> h,i,j,k,# this will be recognised as a comment following 4 fields<EOL> l,m,n,o,<EOL> # previous line should have 5 fields but actually gets merged to previous line. Comments currently only work correctly at the start of a record.
          Hide
          Emmanuel Bourg added a comment -

          I have never seen trailing comments in the wild, is there an application actually producing this kind of files?

          Regarding case 3, I would handle it exactly like the case 2. There is no point returning an empty line. Empty lines are useless in almost every case.

          Show
          Emmanuel Bourg added a comment - I have never seen trailing comments in the wild, is there an application actually producing this kind of files? Regarding case 3, I would handle it exactly like the case 2. There is no point returning an empty line. Empty lines are useless in almost every case.
          Hide
          Sebb added a comment -

          I don't know if there are formats with inline comments.
          The code currently recognises comments at the start of a value as well as at the start of a record.
          This behaviour may be unintentional.

          I agree that empty lines are generally not needed, but a comment line may not be useless.

          Show
          Sebb added a comment - I don't know if there are formats with inline comments. The code currently recognises comments at the start of a value as well as at the start of a record. This behaviour may be unintentional. I agree that empty lines are generally not needed, but a comment line may not be useless.
          Hide
          Emmanuel Bourg added a comment -

          That's most likely unintentional. This case is not covered by the tests, and CSVPrinter always output the comments on a new line.

          Show
          Emmanuel Bourg added a comment - That's most likely unintentional. This case is not covered by the tests, and CSVPrinter always output the comments on a new line.
          Hide
          Sebb added a comment -

          OK, so we should only allow comments at the start of a record.

          This may be a bit tricky to achieve, as the lexer knows about tokens, not records.

          Show
          Sebb added a comment - OK, so we should only allow comments at the start of a record. This may be a bit tricky to achieve, as the lexer knows about tokens, not records.
          Hide
          Sebb added a comment -

          This case is not covered by the tests,

          In fact - as I discovered when I tried to fix the code - it is tested.
          The test CSVParserTest.testDefaultFormat() includes the CSV source:

          a,b\n
          "\n"," "\n
          "",#\n
          

          Note that the comment follows the empty quoted value.

          However, I think this is mistake.

          Show
          Sebb added a comment - This case is not covered by the tests, In fact - as I discovered when I tried to fix the code - it is tested. The test CSVParserTest.testDefaultFormat() includes the CSV source: a,b\n "\n" , " " \n "",#\n Note that the comment follows the empty quoted value. However, I think this is mistake.
          Hide
          Sebb added a comment -

          Fixed test case, docs and code

          Show
          Sebb added a comment - Fixed test case, docs and code

            People

            • Assignee:
              Unassigned
              Reporter:
              Sebb
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development