Statistical analysis of the exon-intron structure and splicing sites of several eukaryotes

Kriventseva E.V., Gelfand M.S.1

Institute of Molecular Biology, Russian Acad. Sci., Moscow;

1Institute of Protein Research, Russian Acad. Sci., Pushchino, 142292, Russia. E-mail: misha@imb.imb.ac.ru.

We have analyzed exon-intron structure and splicing sites of a vertebrate (human), an insect (Drosophila), dicot (maize) and monocot (Arabidopsis thaliana) plants, yeast (Saccharomyces cerevisiae), filamentous fungi (Aspergillus sp.) and protists (Apixomplexa, including Plasmodium sp.).

The yeast exon-intron structures posses a number of unique features. A yeast gene usually has at most one intron. The branch site is strongly conserved, whereas the acceptor site is rather weak. Long yeast introns tend to have stronger acceptor sites.

In other species there is an almost universal correlation between lengths of neighboring exons (in all samples excluding protists) and correlation between lengths of neighboring introns (in human, Drosophila, protist samples). On the average first introns are longer, and anomalously long introns are usually first introns in a gene. The exon length positively correlates with the strengths of splicing sites at its boundaries (in human, Drosophila, plant samples).

There is a universal preference for exons and exon pairs with the (total) length divisible by 3. Introns positioned between codons are preferred, whereas those positioned between the first and second codon positions are avoided.

Introns are more AT-rich that exons. The cheice of A or G at the third position of intron (the donor splice site consensus has R in this position) is correlated with the overall GC-composition of the gene.

In all samples dinucleotide AG is avoided in the region preceding the acceptor site.

This study was partially supported by grants from the Russian Foundation for Basic Research and the State Scientific Program "Human Genome".