Appendix C: About Comma Separated Format Recipient Files

The term "comma separated format" (or "tab separated format" or "CSV") is often used as a catchall term for all kinds of text-based data formats where the data is formatted in a line-by-line fashion. Each line contains one date record and a number of columns per line, where the different columns are separated by comma, a tab, or some other separator character.

LISTSERV Maestro can correctly interpret comma separated text files in various formats as long as the following rules are applied:

·         Any character may be used as the separator character, although a comma, tab, or semicolon is conventional.

·         The same separator character must be used in all lines for the entire file.

·         All lines in the file must have the same number of columns, which means the same number of separator characters.

·         Empty columns may be created in the order that the same number of separator characters is present in every line of the file.

·         Having two separator characters in direct succession, without any characters in between, creates an empty column.

·         If a line begins with the separator character, then LISTSERV Maestro assumes the line begins with an empty column.

·         If a line ends with the separator character, then LISTSERV Maestro assumes the line ends with an empty column.

·         If the character that is used as the separator character also appears as part of the value of one or several of the column fields, then it is necessary to enclose the fields in quotation marks or another quote character.

The last rule listed above introduces the concept of "quoted values". As described, it is necessary to quote a value if the value contains the separator (because otherwise the separator would be interpreted as the start of another value). For LISTSERV Maestro to correctly know how to deal with quoted values, it is necessary to tell LISTSERV Maestro if the comma separated file contains any quoted values or not.

If a file does not contain any quoted values, then the additional rules explained below do not apply, i.e. even if one of the usual quote characters (quotation marks or the apostrophe) would appear anywhere in the file, then they would be interpreted by LISTSERV Maestro as just another normal character.

However, such a file may not have any value which contains the separator. If at least one value contains the separator, then this value must be quoted, and, because of this, the file becomes a file with quoted values again.

If a file does contain quoted values (at least one of them), then it must follow these additional rules:

·         Any character, except for the separator character, can be used as the quote character (quotation marks or apostrophe are conventional). This character must be used both as the opening and closing quote and must be used for all quoted fields in the file.

·         A field must be quoted if it fulfills any of these two conditions:

o        If the field contains the separator character in the value, then the field must be quoted.

o        If the field contains the quote character in the value, and this quote character is also the first character of the value, then the field must be quoted. This also means that if the field contains the quote character, but it’s not as the first character, then it is not necessary to quote the field.

·         It is not necessary that all fields are quoted. Only fields that fall into one of the two cases described above have to be quoted. However, it is legal to also quote fields which do not fulfill these conditions.

·         Usually one of two styles is used: One style quotes all fields (both the ones that have to be quoted and the ones which do not), while the other style quotes only those fields which have to be quoted (all others are left unquoted). LISTSERV Maestro is able to understand both of these styles, and also mixes of the two styles, as long as the rules described here are followed).

·         If a field is a "quoted field" and the quote character also appears as part of the value of the field, then this character must be escaped. Escape the quote character by using it twice, in direct succession. The double appearance of the quote character will be interpreted as a single appearance that is part of the field value.

·         If a field is an "unquoted field" and the quote character also appears as part of the value of the field, then this character must not be escaped. Quote-escaping is only necessary in quoted fields.

·         A "quoted field" is parsed from the file as follows: The field starts with the opening quote and ends with the next appearance of a not-escpaed quote character after the opening quote. The end of the field must then be followed by a separator character or by the end of the line - trailing white space after the last field of the line is allowed.

·         The value of the field is the text between the two quotes, excluding the quotes. Any escaped quotes in the value will be un-escaped.

·         An "unquoted field" is parsed from the file as follows: The field starts with the first character and ends with the next appearance of the separator character (or the end of the line). The value of the field is the text with this start and end, excluding the separator character.

Note: If only some fields in a file are quoted, especially if those fields appear near the end of the file, it is important to manually define the separator and quote character instead of allowing LISTSERV Maestro to attempt to parse the file automatically. By manually defining the separator and quote characters, LISTSERV Maestro is forced to look at the entire file and parse it according to the values entered for these characters. If LISTSERV Maestro attempts to parse the file automatically when only some, but not all, fields are quoted, then those fields may be parsed incorrectly or may be rejected as invalid.

Here are some examples:

Simple values, separated by comma, not quoted:

  John,Doe,Chicago,USA

  Lucy,Summers,London,GB

  Karl,Hauser,Frankfurt,D

This will be parsed as follows:

John

Doe

Chicago

USA

Lucy

Summers

London

GB

Karl

Hauser

Frankfurt

D

 

Simple values, separated by comma, not quoted, with empty fields:

  John,,Chicago,USA

  ,Summers,London,GB

  Karl,Hauser,Frankfurt,

 

This will be parsed as follows:

John

 

Chicago

USA

 

Summers

London

GB

Karl

Hauser

Frankfurt

 

 

Values of which some contain a comma, separated by comma, quoted with <">:

Using the style that quotes all values:

  "John","Doe","Chicago, Illinois","USA"

  "Lucy","Summers","London, England","GB"

  "Karl","Hauser","Frankfurt","D"

Or using the style that quotes only the values that have to be quoted:

(The only values that have to be quoted in this example are the two values containing the separator character <,>.)

  John,Doe,"Chicago, Illinois",USA

  Lucy,Summers,"London, England",GB

  Karl,Hauser,Frankfurt,D

 

Both will be parsed as follows:

John

Doe

Chicago, Illinois

USA

Lucy

Summers

London, England

GB

Karl

Hauser

Frankfurt

D

 

Values of which some contain a comma, separated by comma, quoted with <">, with empty fields:

Using the style that quotes all values:

  "John","","Chicago, Illinois","USA"

  "","Summers","London, England","GB"

  "Karl","Hauser","Frankfurt",""

 

Or using the style that quotes only the values that have to be quoted:

(The only values that have to be quoted in this example are the two values containing the separator character <,>.)

  John,,"Chicago, Illinois",USA

  ,Summers,"London, England",GB

  Karl,Hauser,Frankfurt,

 

Both will be parsed as follows:

John

 

Chicago, Illinois

USA

 

Summers

London, England

GB

Karl

Hauser

Frankfurt

 

 

Values of which some contain a comma and some the quote character, separated by comma, quoted with <">:

Using the style that quotes all values:

  "John","Doe","Chicago ""The Windy City"", Illinois","USA"

  """Little"" Lucy","Summers","London, England","GB"

  "Karl ""Big Boy""","Hauser","Frankfurt","D"

 

Or using the style that quotes only the values that have to be quoted:

(The values that have to be quoted in this example are the two values containing the separator character <,> and also the first value of the second row, which starts with the quote character <">. In comparison, the first value of the third row does contain the quote character too, but not as the first character. Therefore this field does not have to be quoted and the quote character is therefore also not escaped.)

  John,Doe,"Chicago ""The Windy City"", Illinois",USA

  """Little"" Lucy",Summers,"London, England",GB

  Karl "Big Boy",Hauser,Frankfurt,D

Both will be parsed as follows:

John

Doe

Chicago “The Windy City”, Illinois

USA

“Little” Lucy

Summers

London, England

GB

Karl “Big Boy”

Hauser

Frankfurt

D