Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0
    • Component/s: Parser
    • Labels:
      None

      Description

      Commons CSV is missing some elements to help dealing with CSV file headers.

      With the current API one has to write the following code to read the fields by name:

      CSVParser parser = new CSVParser(in);
      Iterator<String[]> it = parser.iterator();
      
      // read the header
      String[] header = it.next();
      
      // build a name to index mapping
      Map<String, Integer> mapping = new HashMap<>();
      for (int i = 0; i < header.length; i++) {
          mapping.put(header[i], i);
      }
      
      // parse the records
      for (String[] record : parser) {
          Person person = new Person();
          person.setName(record[mapping.get("name")]);
          person.setEmail(record[mapping.get("email")]);
          person.setPhone(record[mapping.get("phone")]);
          persons.add(person);
      }
      

      The header should be defined in the format with something like this:

      CSVFormat format = CSVFormat.DEFAULT.withHeader();
      

      Then either the parser provides the column name to index mapping automatically:

      CSVFormat format = CSVFormat.DEFAULT.withHeader();
      CSVParser parser = new CSVParser(in, format);
      
      // parse the records
      for (String[] record : parser) {
          Person person = new Person();
          person.setName(record[parser.indexOf("name")]);
          person.setEmail(record[parser.indexOf("email")]);
          person.setPhone(record[parser.indexOf("phone")]);
          persons.add(person);
      } 
      

      or the parser returns a Map like structure similar to a JDBC ResultSet (replacing String[]):

      CSVFormat format = CSVFormat.DEFAULT.withHeader();
      
      for (CSVRecord record : format.parse(in)) {
          Person person = new Person();
          person.setName(record.get("name"));
          person.setEmail(record.get("email"));
          person.setPhone(record.get("phone"));
          persons.add(person);
      } 
      
      1. CSVFormat.java
        18 kB
        Emmanuel Bourg
      2. CSVParser.java
        9 kB
        Emmanuel Bourg
      3. CSVRecord.java
        2 kB
        Emmanuel Bourg

        Issue Links

          Activity

          Hide
          Sebb added a comment -

          If the file does not provide a header record, it should be possible for the user to provide one.

          Show
          Sebb added a comment - If the file does not provide a header record, it should be possible for the user to provide one.
          Hide
          Emmanuel Bourg added a comment -

          Ok, what about CSVFormat.withHeader(String...) ?

          CSVFormat format = CSVFormat.DEFAULT.withHeader("foo", "bar");
          

          If the array is empty it means the user expects a header in the file, otherwise the columns specified are used.

          Show
          Emmanuel Bourg added a comment - Ok, what about CSVFormat.withHeader(String...) ? CSVFormat format = CSVFormat.DEFAULT.withHeader( "foo" , "bar" ); If the array is empty it means the user expects a header in the file, otherwise the columns specified are used.
          Hide
          Sebb added a comment -

          It should still be possible to read a CSV file without a header and without providing one.

          Show
          Sebb added a comment - It should still be possible to read a CSV file without a header and without providing one.
          Hide
          Emmanuel Bourg added a comment -

          Yes it is. I'm attaching my suggested implementation.

          Show
          Emmanuel Bourg added a comment - Yes it is. I'm attaching my suggested implementation.
          Hide
          Sebb added a comment -

          I think the header should be initialised in the CSVParser ctor (failing that, could perhaps use an IODH idiom).
          As it stands, getRecord() calls format.getHeader() every time if that returns null.

          Also, I'm not sure why CSVRecord should ever have a null values array - surely a zero-length array would be easier to use?

          Also, the mapping applies to all records, so I don't think it belongs in the record.

          Show
          Sebb added a comment - I think the header should be initialised in the CSVParser ctor (failing that, could perhaps use an IODH idiom). As it stands, getRecord() calls format.getHeader() every time if that returns null. Also, I'm not sure why CSVRecord should ever have a null values array - surely a zero-length array would be easier to use? Also, the mapping applies to all records, so I don't think it belongs in the record.
          Hide
          Emmanuel Bourg added a comment -

          Ok for initializing the header in the constructor.

          I preferred a null array to save some memory.

          The intent is to have the mapping in the parser and share it with the records. Another implementation is to put a reference to the parser in the records and fetch the mapping from the parser. Not sure it's really better.

          Show
          Emmanuel Bourg added a comment - Ok for initializing the header in the constructor. I preferred a null array to save some memory. The intent is to have the mapping in the parser and share it with the records. Another implementation is to put a reference to the parser in the records and fetch the mapping from the parser. Not sure it's really better.
          Hide
          Sebb added a comment -

          I preferred a null array to save some memory.

          The empty string array is immutable and can be safely shared.

          Using null increases the code because have to check for null.

          Show
          Sebb added a comment - I preferred a null array to save some memory. The empty string array is immutable and can be safely shared. Using null increases the code because have to check for null.
          Hide
          Emmanuel Bourg added a comment -

          Ok. I'm going to commit what I have and we'll work out the details.

          Show
          Emmanuel Bourg added a comment - Ok. I'm going to commit what I have and we'll work out the details.
          Hide
          Emmanuel Bourg added a comment -

          I found an issue with the initialization in the constructor, it forces the constructor to declare IOException. If we can avoid adding a checked exception that would be nice. On the other hand the parser is probably created right after a FileReader, so the exception has to be handled already.

          Show
          Emmanuel Bourg added a comment - I found an issue with the initialization in the constructor, it forces the constructor to declare IOException. If we can avoid adding a checked exception that would be nice. On the other hand the parser is probably created right after a FileReader, so the exception has to be handled already.
          Hide
          Sebb added a comment -

          This seems to be complete

          Show
          Sebb added a comment - This seems to be complete

            People

            • Assignee:
              Unassigned
              Reporter:
              Emmanuel Bourg
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development