Mathematical analysis of overlapping genes

Kozlov N.N.

Keldysh Institute of Applied Mathematics, Russian Academy of Sciences, 125047, Moscow, Miusskaya Sq. 4, Russia

Overlapping genes of one RNA chain are investigated. The inverse problem is posed: to compute all the possible nucleotide sequences corresponding to protein sequences which genes overlap. Its solutions for binary (Theorem 1) and triple (Theorem 2) overlappings are presented. From Theorem 1 it follows that, for the double overlaps, 286 different local overlaps may be selected. Each of these overlaps determines one or two positions and type of nucleotide substitutions resulting in silent mutations. In 187 of these overlaps termination codons (ter) are contained, and in 99 the codons of leucine (Leu) or arginine (Arg) are involved. From Theorem 2 it follows that, for the region of the triple overlapping, the positions of this type do not occur. The specific features of nucleotides entries in the positions under examination are studied. For genomes which contain the longest regions of genetic overlaps (two groups of viruses HBV and HIV) the non-random nature of entries of this type was stated. Ressible reasons of that non-randomness, as well as features of local overlappings, are discussed. For codon families ter, Leu, Arg their special properties, which result in existence of the positions under consideration, have been studied. Due to the structure of these families and serine (Ser) family, there exists the degeneracy of the universal genetic code with respect not only to the third base of the codon. The specific features of using the codons families Ser, Leu, Arg for double and triple overlaps have been studied. The analysis performed leads to a hypothesis about the origin of these codons. It was suggested that the final "choice" of these six corresponding triplet codons can be related to the evolution of DNA molecules at the stages, when according to contemporary conceptions, the restrictions of the genome size began to influence and overlapping genes appeared. In the paper it is demonstrated that this "choice" could not be independent of the "choice" of ter codons.

This research was supported by Russian Foundation for Basic Research Grants N 98-01-00059 and N 96-15-97229.