------------------------------------------------------ URA 2055 ---------------- Author(s) : PERRIERE G., THIOULOUSE J. On-line tools for sequence retrieval and multivariate statistics in molecular biology. Computer Applications in the Biosciences Serial Number : 12, 1996, pp. 63-69 Key-Words : World-Wide Web; Sequence data banks; Retrieval system; Multivariate analysis; Sequence analysis. Abstract We have developed a World-Wide Web server for browsing sequence collections structured under ACNUC format and for performing multivariate analyses on sequences. General collections (like GenBank or EMBL), as well as specialized data banks (like Hovergen and NRSub) can be accessed. This system allows to build complex queries, and the result of each query, represented by a list of sequences, is stored on the server. It is then possible to re-use this list to compute multivariate analyses on the sequences. Two examples of applications are shown. The first one consists in a study of codon usage with correspondence analysis on all the protein genes of Haemophilus influenzae Rd. This study allows to identify the highly expressed genes and the integral membrane proteins of this organism. The second one consists in an ordination of 70 aligned protein sequences of growth hormone with principal coordinate analysis. With this method, we are able to re-establish the patterns of relationships between the sequences previously determined with tree building programs. E-Mail : perriere@biomserv.univ-lyon1.fr Laboratoire de Biometrie, Genetique et Biologie des Populations (URA 2055) - Univ. C. Bernard LYON I - 69622 VILLEURBANNE CEDEX