
PROGRAMS & LIBRARIES :

* Check all output functions results (printf, putc, ...).
* Load aa weights from extern data file.
* Allow extra gaps chars `.', `?', `~' (in all formats ?).
* Must convert all `..' to `. .' in GCG header.
* Make thread safe libraries (msf+gcg).
* Use valid gap characters according to each format.
* Check nucleic NEXUS format names.
* Check errors vs. normal end parsing.
* New `-q' (quiet) flag to suppress format message.
* New `-v' (verbose) flag to display all formats checks and results.
* Fix memory leaks in FASTA and PHYLIPS format detections.
* Make some sequence_t fields private.
* Warn for truncated values (names, accessions, ...).
* Check for sequence names output validity in each format.
* Require at least 2 sequences in alignments, even with non strict mode.
* Cannot handle matched token larger than 16kB.

FORMATS :

* ASN1: New sequence format from NCBI.
* BAMBE: New alignment format (derived from PHYLIP).
* CLUSTAL: Handle positions range after sequence names.
* EMBL: Restore version parsing with new ID line format.
* FASTA: Add NCBI header format parsing.
* GENAL: New sequence format from GenAl program (may conflict with FASTA).
* GENBANK: Cannot handle sequence bigger than 10Mb.
* MAF: Multiple Alignment Format (from UCSC Genome Bioinformatics).
* MEGA: Only a single dataset is supported.
* NEXUS: Non interleaved format.
* NEXUS: MESQUITE program seems to generate invalid format.
* PHYLIP: Support for multiple alignments in the same file.
* PHYLIP: Exercise sequence names cleanup.
* PSA/XPSA: New alignment format from pftools package, cf. psa(5), xpsa(5).
* RSF: New `Rich Sequence Format' format from GCG.
* SELEX: New alignment format (STOCKHOLM like).
* SPROT: Fix DE line output for new structured datas.

DATABANKS :

* EMBL: Wrong author name (I13016 - rel_pat_unc_10_r109.dat).
* EMBL: Missing separator in RP line (GQ527172 - rel_std_inv_04_r109.dat).
* EMBL: Missing separator in RP line (AH000025 - rel_std_mus_02_r109.dat).
* EMBL: Missing separator in RP line (EU409559 - rel_std_phg_01_r109.dat).
* EMBL: Missing separator in RP line (D16449 - rel_std_pro_04_r109.dat).
* EMBL: RT line exceed 80 characters (JA477713 - r110u016.dat).

* GENBANK: Keyword includes `;' character (JF681370 - gbenv41.seq).
* GENBANK: Invalid keyword separator `;  ' (U18916 - gbpln50.seq).
* GENBANK: Empty dblink field (AP012272 - nc1027.flat).
* GENBANK: Empty dblink field (AP012206 - nc1101.flat).
* GENBANK: Empty dblink field (AP012224 - nc1102.flat).
* GENBANK: Keyword includes `;' character (JN802672 - nc1205.flat).
* GENBANK: Keyword word splitted between 2 lines (L13724).
* GENBANK: Splitted single keyword `SodCc; Le; 2 gene' (X87372).

* GENBANK_WGS: CDS sequence miss indentation (CAAE01010487 - wgs.CAAE.gbff).

* GENPEPT: Invalid keywords separator `;  ' (U18916_1 - gppln50.seq).

* IMGT: Flat file has strange characters.
* IMGT: Many trailing spaces.
* IMGT: Primary accession number duplicated if secondary exists.

* REFSEQ: Unexpected `?' character (YP_004935261 - rsnc.1123.2011.gpff).
* REFSEQ: Unexpected `?' character (YP_004935348 - rsnc.1124.2011.gpff).
* REFSEQ: Unexpected `?' character (YP_004935261 - rsnc.1126.2011.gpff).
* REFSEQ: Unexpected `?' character (YP_004935261 - rsnc.1209.2011.gpff).

* UNIPROT: Ref 1 title has internal `"' (B2CI52 - uniprot_trembl.dat).
* UNIPROT: Ref 2 title has internal `"' (Q50565 - uniprot_trembl.dat).
* UNIPROT: Ref 1 title has internal `"' (Q4GX11 - uniprot_trembl.dat).

