Yeast Gene Duplications
(Release 3, September 1999)
Ken Wolfe, Cathal Seoighe and Denis Shields
Genetics Department, Trinity College Dublin
Introduction:
This Web site contains data on duplicated genes in the yeast (Saccharomyces
cerevisiae) genome. It concentrates particularly on duplicated chromosomal
regions. These are situations where one region of the genome contains a
cluster of genes that has a cluster of homologues somewhere else in the
genome, with conservation of both gene order and transcriptional orientation
between the two copies of the duplicate chromosomal region. We have proposed that
the structure and distribution of these regions indicates that the entire yeast
genome underwent duplication at some stage in its distant past, i.e., that yeast
is a degenerate tetraploid. We have identified
52 duplicated regions (or "blocks") that are likely to have been derived
from the simultaneous duplication of the whole genome and a
further 32 candidate paired regions that may also be derived from genome duplication,
although the evidence is less convincing.
See:
Seoighe, C and Wolfe, K.H. (1999) Updated map of duplicated regions in the yeast genome. Gene 238:253-261
Wolfe, K.H. and Shields, D.C. (1997) Molecular evidence for an ancient
duplication of the entire yeast genome. Nature 387, 708-713 (1997).
See also the "Chart of Duplications" by MIPS in the Yeast Genome Directory (H.W. Mewes et al., Nature 387 (suppl.), 33-34, 1997).
Release 3 of the data has:
- Smith-Waterman search method to identify paralogs
- Inclusion of candidate duplicate regions
- 39 tRNA gene pairs and one snRNA gene pair
Dataset:
We used a dataset of 5790 yeast proteins encoded by the 16 yeast nuclear
chromosomes. These are listed under the "Browse through chromosomes"
option in the Main Menu. This dataset was produced
by us in Dublin from gene lists supplied by YPD, SGD and MIPS.
Our aim is to produce a gene list that includes all functional protein-coding
genes but excludes all spurious ORFs and Ty elements. We have labelled these
spurious ORFs as "junk" in our listings of chromosomes, and they
were not included in the protein dataset used for BLAST searches. If your
favourite ORF has been labelled as junk, we apologise.
Thanks to: Jim Garrels (YPD), Mike Cherry (SGD) and Kaj Albermann (MIPS)
for providing gene lists.
Web pages for each gene:
There is a separate Web page for each gene, containing:
- A brief description of its function, from YPD.
- A list of the 10 genes on each side of it on the chromosome.
- The results of a BLASTP search of this gene versus our yeast
protein database; all BLASTP hits with scores >60 are listed. BLASTP
searches were carried out using the filter seg which masks repetitive
regions of protein sequence.
- The results of a Smith-Waterman (SSEARCH) search with this
gene, again using the seg filter.
- The option to view a pairwise alignment for each BLAST or SSEARCH
hit listed. These alignments are produced using the Smith-Waterman algorithm.
Criteria for definition of duplicated chromosomal regions:
To qualify as a duplicated block in our analysis, a pair of chromosomal
regions must contain:
- At least 3 BLASTP hits to one other
- Log-normalised Smith-Waterman score of 17.5 or greater
- Conservation of gene order
- Conservation of transcriptional orientation
- Spacing of <= 30 genes between hits on each chromosome
We excluded duplicated genes located in the subtelomeric repeats, which
are highly similar among multiple chromosomes.
Dot matrix plots:
To help visualise the duplicated blocks, we have produced dot
matrix plots for every pair of chromosomes, and "zoom-in"
views for the 55 duplicated regions we identified. These show BLASTP protein
hits (with scores >= 200), plotted at the position where the genes lie
on each chromosome. Duplicated chromosomal regions appear as diagonal series
of points. Different symbols are used to indicate the transcriptional orientations
of genes.
Previous work by other labs:
Some of the duplicated regions have already been reported by other labs.
References are given below. Most of these papers describe only part of a
duplicated block. Please e-mail Ken
Wolfe if you know of others. See also the papers in the Nature Yeast
Genome Directory:
Block 1: Bussey, H. et al. Proc. Natl. Acad. Sci. U.S.A. 92,
3809-3813 (1995); Johnston, M. et al. Science 265, 2077-2082 (1994);
Steensma, H.Y. et al. Curr. Genet. 16, 131-137 (1989).
Block 2: Parle-McDermott, A.G., Hand, N.J., Goulding, S.E. &
Wolfe, K.H. Yeast 12, 999-1004 (1996); Pearson, B.M., Hernando, Y.,
Payne, J., Wolf, S.S., Kalogeropoulos, A. & Schweizer, M. Yeast
12, 1021-1031 (1996); Purnelle, B. & Goffeau, A. Yeast 12, 1475-1481
(1996); Storms, R.K. et al. Genome 40, 151-164 (1997).
Block 3: Wolfe, K.H. & Lohan, A.J.E. Yeast 10, S41-S46
(1994).
Block 4: Logghe, M., Molemans, F., Fiers, W. & Contreras, R.
Yeast 10, 1093-1100 (1994); Berroteran, R.W. & Hampsey, M. Yeast
11, 761-766 (1995).
Block 11: Lalo, D., Stettler, S., Mariotte, S., Slonimski, P.P. &
Thuriaux, P. C.R. Acad. Sci. Paris 316, 367-373 (1993); Lalo, D.
et al. Yeast 10, 523-533 (1994).
Block 12: Wolfl, S., Hanemann, V. & Saluz, H.P. Yeast
12, 1549-1554 (1996).
Block 25: Kail, M., Juttner, E. & Vaux, D. Yeast 12, 799-807
(1996).
Block 28: Melnick, L. & Sherman, F. J. Mol. Biol. 233,
372-388 (1993); McKnight, G.L., Cardillo, T.S. & Sherman, F. Cell
25, 409-419 (1981); Kang, H.A., Schwelberger, H.G. & Hershey, J.W.B.
Mol. Gen. Genet. 233, 487-490 (1992).
Block 39: Pohlmann, R. & Philippsen, P. Yeast 12, 391-402
(1996); Nasr, F., Becam, A.-M. & Herbert, C.J. Yeast 12, 493-499
(1996).
Block 40: Galibert, F. et al. EMBO J. 15, 2031-2049 (1996).
Block 41: Galibert, F. et al. EMBO J. 15, 2031-2049 (1996);
Katsoulou, C., Tzermia, M., Tavernarakis, N. & Alexandraki, D. Yeast
12: 787-797 (1996).
Block 42: Galibert, F. et al. EMBO J. 15, 2031-2049 (1996);
Huang, M.-E., Manus, V., Chuat, J.-C. & Galibert, F. Yeast 12,
869-875 (1996).
Block 43: Wente, S.R., Rout, M.P. & Blobel, G. J. Cell. Biol.
119, 705-723 (1992).
Block 47: Pearson, B.M., Hernando, Y., Payne, J., Wolf, S.S., Kalogeropoulos,
A. & Schweizer, M. Yeast 12, 1021-1031 (1996).
Block 49: Molenaar, C.M.T. et al. Nucleic Acids Res. 12, 7345-7358
(1984).
A tetraploid origin for the yeast genome was suggested in 1987 by M. Mitchell
Smith (J. Mol. Evol. 24, 252-259), based mainly on analysis of duplicate
histone loci.
The yeast gene duplications project is supported by
The Fourth Framework Biotechnology
Programme of the European Union
The Wellcome Trust
Forbairt
- The Irish Science and Technology Agency
The
University of Dublin -- Trinity College
[go to the Main Menu]
last updated: 10 Dec 1999