Commons CSV
  1. Commons CSV
  2. CSV-23

Excel strategy uses wrong separator

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0
    • Component/s: None
    • Labels:
      None

      Description

      The Excel strategy is defined as follows.

          public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED, ESCAPE_DISABLED, false, 
                                                                       false, false, false);
      

      However, when I do a "Save as" in Excel the separator used is actually ';'. Thus, parsing the CSV file as suggested in the JavaDoc of CSVParser fails.

      String[][] data =
         (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
      

      Simple test to reproduce:

      import java.io.IOException;
      import java.io.StringReader;
      
      import org.apache.commons.csv.CSVParser;
      import org.apache.commons.csv.CSVStrategy;
      
      public class CSVExcelStrategyBug {
      
      	public static void main(final String[] args) {
      		try {
      			System.out.println("Using ;");
      			parse("a;b\nc;d");
      			System.out.println();
      			System.out.println("Using ,");
      			parse("a,b\nc,d");
      		} catch (final IOException e) {
      			e.printStackTrace();
      		}
      
      	}
      
      	private static void parse(final String input) throws IOException {
      		final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
      		for (final String[] row : data) {
      			System.out.print("[");
      			for (final String cell : row) {
      				System.out.print("(" + cell + ")");
      			}
      			System.out.println("]");
      		}
      	}
      
      }
      

      Actual output:

      Using ;
      [(a;b)]
      [(c;d)]
      
      Using ,
      [(a)(b)]
      [(c)(d)]
      

      Expected output:

      Using ;
      [(a)(b)]
      [(c)(d)]
      
      Using ,
      [(a,b)]
      [(c,d)]
      

        Activity

        Hide
        Emmanuel Bourg added a comment -

        I added a note in the Javadoc explaining that the list separator is specific to the locale. CSVFormat.EXCEL must be constant so it's not possible to guess the separator here.

        Show
        Emmanuel Bourg added a comment - I added a note in the Javadoc explaining that the list separator is specific to the locale. CSVFormat.EXCEL must be constant so it's not possible to guess the separator here.
        Hide
        Peter Koszek added a comment - - edited

        RFC 4180 defines commas to be field separators.
        The Excel strategy uses the local configuration to identify the separator.

        On Experts-Exchange we are told: "But in different countries the seperator is different. Some use a "," some use a ";" some use a "." some use a ":""
        At Microsoft Support we can read: "For most international versions, the default list separator is a semicolon ( ; ). However, in Visual Basic for Applications code, you must type the English function or property name and use a comma (,) as a list separator. "

        The following approach can help to predict a field separator.
        On Windows, read registry key "HKCU\Control Panel\International\sList" if possible.
        On other systems, try to avoid a collision with the floating point separator like this:

        DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
        char decimalSeparator = dfs.getDecimalSeparator();
        char listSeparator = ';';
        // If the floating point separator is really a point, use comma
        if (decimalSeparator == '.') {
            listSeparator = ',';
        }
        

        CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way.
        I wouldn't expect a CSV framework to be able to simulate Excel exactly.
        Using encapsulation with ", it is possible to have the same field and decimal separator.
        An 'EXCEL_STRATEGY' can't be created statically on application scope, it's session scope or even request scope.

        Show
        Peter Koszek added a comment - - edited RFC 4180 defines commas to be field separators. The Excel strategy uses the local configuration to identify the separator. On Experts-Exchange we are told: "But in different countries the seperator is different. Some use a "," some use a ";" some use a "." some use a ":"" At Microsoft Support we can read: "For most international versions, the default list separator is a semicolon ( ; ). However, in Visual Basic for Applications code, you must type the English function or property name and use a comma (,) as a list separator. " The following approach can help to predict a field separator. On Windows, read registry key "HKCU\Control Panel\International\sList" if possible. On other systems, try to avoid a collision with the floating point separator like this: DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault()); char decimalSeparator = dfs.getDecimalSeparator(); char listSeparator = ';'; // If the floating point separator is really a point, use comma if (decimalSeparator == '.') { listSeparator = ','; } CSV should be a standard, Excel is a specific application which uses the CSV standard in a special way. I wouldn't expect a CSV framework to be able to simulate Excel exactly. Using encapsulation with ", it is possible to have the same field and decimal separator. An 'EXCEL_STRATEGY' can't be created statically on application scope, it's session scope or even request scope.
        Hide
        Gunnar Wagenknecht added a comment -

        That's interesting. So any Excel strategy needs to be prepared for different separators. I wonder if the Excel strategy should also use the regional settings or if there should be different Excel strategies. The current way confuses users. At least the JavaDoc needs a hint to the regional settings depended behavior. But IHMO forcing users to change their regional settings just to have some software reading CSV properly is not an option, is it?

        Show
        Gunnar Wagenknecht added a comment - That's interesting. So any Excel strategy needs to be prepared for different separators. I wonder if the Excel strategy should also use the regional settings or if there should be different Excel strategies. The current way confuses users. At least the JavaDoc needs a hint to the regional settings depended behavior. But IHMO forcing users to change their regional settings just to have some software reading CSV properly is not an option, is it?
        Hide
        Niall Pemberton added a comment -

        I did a saveAs csv using Excel 2007 on Windows and it used a comma as the separator.

        Apparently (according to the Excel Help) - Excel uses the "List Separtor" configured for windows - so I think Henri is probably right. You can check the separator your system is configured with by going to the Windows "Control Panel", selecting "Regional and Language Options" and clicking on the "Customize" button in the "Regional Options" tab

        Show
        Niall Pemberton added a comment - I did a saveAs csv using Excel 2007 on Windows and it used a comma as the separator. Apparently (according to the Excel Help) - Excel uses the "List Separtor" configured for windows - so I think Henri is probably right. You can check the separator your system is configured with by going to the Windows "Control Panel", selecting "Regional and Language Options" and clicking on the "Customize" button in the "Regional Options" tab
        Hide
        Henri Yandell added a comment -

        Excel on the Mac puts a comma in. I don't have it on Windows, but my suspicion is that this might be a locale specific issue rather than Windows on the Mac putting a semi-colin in.

        There are also DOS CSV and Windows CSV options in the Save As - the only difference appears to be newlines with the default CSV being the same as the DOS CSV.

        Show
        Henri Yandell added a comment - Excel on the Mac puts a comma in. I don't have it on Windows, but my suspicion is that this might be a locale specific issue rather than Windows on the Mac putting a semi-colin in. There are also DOS CSV and Windows CSV options in the Save As - the only difference appears to be newlines with the default CSV being the same as the DOS CSV.

          People

          • Assignee:
            Unassigned
            Reporter:
            Gunnar Wagenknecht
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development