Hendry et al.

A Stereochemical Rationale For the Genetic Code Derived From Complementary Fit Of Amino Acids Into Cavities Formed In Codon/Anticodon Sequences in Double Stranded DNA: Further Evidence Based Upon Noncomplementarity of Untranslated Amino Acids.

Lawrence B. Hendry,* Virendra B. Mahesh,* Edwin D. Bransome, Jr.#, Marion S. Hutson+ and Lillian K. Campbell**

*Drug Design and Development Laboratory, Department of Physiology and Endocrinology, #Department of Medicine, +Department of Pathology, Medical College of Georgia., Augusta, Georgia 30912 and **Scottish Rite Childrens Hospital, 1641 S. Ponce de Leon, Atlanta, Georgia 30307 U.S.A.

Correspondence should be addressed to: larryh@therock.mcg.edu
Submitted for publication: August 1995

Keywords: nucleic acids, genetic code, amino acids, DNA, RNA, complementarity

Title Page Abstract Introduction Materials and Methods Results
Discussion Conclusions Acknowledgements References Table of Contents

Figure 1 Figure 2 Figure 3 Figure 4 Figure 5


Computer modeling was employed to demonstrate that the amino acid L-isoleucine fits remarkably well into the apyrimidinic site in DNA 5' A_C 3', 3' TAG 5' derived by removal of the second codon nucleotide T from its double stranded codon/anticodon triplet. The complex 5' A-ILE-C 3', 3' TAG 5' exemplified remarkable complementarity of van der Waals surfaces as well as highly favorable electrostatic interactions as measured by energy calculations. Alterations in the natural amino acid including changes in the chirality of the isoleucine side chain to those not occurring in protein resulted in poor fitting structures. These findings confirm earlier studies made with physical models and provide strong evidence that the genetic code has a stereochemical basis.


Understanding the relationships between proteins and nucleic acids is of fundamental importance in unraveling the function, regulation and evolution of genes. Historically, it was not long after Watson and Crick's publication of the structure of the double helical B form of DNA (1) that Gamov (2) raised the question of whether there were unique lock and key relationships between the structures of the twenty amino acids occurring naturally in proteins and the four bases in DNA. Subsequently, the laboratories of Nirenberg and Ochoa discovered that amino acids were coded by different triplet sequences of nucleic acid bases which were given the name codons (3, 4). The position of amino acids in a given protein sequence (amino to carboxyl) could be read from the sequence of codons (5' to 3'). The assignments of amino acids to particular codons have since been firmly established in the genetic code table.

Gamov's belief that there must be structural relationships between nucleic acids, amino acids and proteins was shared by a large number of early investigators most notably Woese (5). Various experimental and theoretical approaches have been reported including suggestions that amino acids might interact with either codons or anticodons (6, 7 and references therein). In a 1968 landmark paper on the origin of the genetic code, Crick (8) stated that it was "...essential to pursue the stereochemical theory". At that time, because a clear stereochemical model was not available, Crick hypothesized that the genetic code could however be a frozen accident of evolution and thus uninterpretable today. He also suggested that any satisfactory model should measure amino acid/nucleic acid interactions in terms of binding constants. Perhaps of greatest insight was his suggestion that "it might be more useful to consider which amino acids are not used in the code".

Our laboratory has employed various physical models to demonstrate relationships between amino acids and codon/anticodon bases. Initially, amino acid side chains were observed to fit into cavities between base pairs in partially unwound single stranded RNA as well partially unwound double stranded RNA (9). In many cases, the amino acids fit particularly well into sites constructed from the first two bases of their codons and anticodons. Structural analogies between amino acid side chains and nucleic acid bases were then discovered leading to the idea of replacing a base in DNA and/or RNA with the amino acid (10). In subsequent studies, the Watson and Crick B form of DNA was employed because the regularity of this conformation of the double helix and in particular the symmetrical property manifest in the dyad axis could be readily used to compare the amino acid/nucleic acid complexes with one another. Physical models of cavities in double stranded DNA in which a base was removed were constructed leading to the discovery that amino acids were excellent fits into certain cavities (7, 11). The amino acids fit best into the cavities derived from their codon/anticodons. Poor fits were generally found when the amino acids were inserted into cavities not derived from their codons. Structurally altered amino acids which were not capable of being translated into protein were poor fits. In these early studies, it was not possible to quantitate degree of fit of amino acids and until recently computational methods which could reliably and rigorously quantitate interactions of molecular structures have been unavailable.

In this study, computer modeling was employed to reinvestigate our initial observations of the stereochemical basis for the genetic code which were based upon relatively primitive physical models. Our interest in the code problem was rekindled by two developments. First, new evidence has been reported by Harris, Sullivan and Hickok (12) which supports a stereochemical theory for the origin of the genetic code based upon a study of the interaction of the DNA binding domain of the glucocorticoid receptor protein with the glucocorticoid response element sequence in DNA. These findings suggest that precise stereochemical relationships are conserved between selected amino acid residues existing in certain regulatory proteins and the nucleic acid sequences of genes that they regulate. The relationships were found to be directly reflected the genetic code. Specifically, those amino acids of the protein which recognize the gene appear to be oriented in a manner allowing for direct interaction with sequences which contain their codons. Second, modern computational methods including standard force field energy calculations have progressed to the point that accurate and reproducible, quantitative measurements of the interactions of molecules can now be readily obtained. Such calculations are based upon the fundamental physicochemical principle of complementarity originally defined by Pauling and Delbruck (13) i.e. favorable steric interactions of van der Waals surfaces and electrostatic attraction of suitably oriented atoms and functional groups.

At the outset, we wish to point out to the reader that any study which examines structural relationships between amino acids and nucleic acids will be biased by the existence of the known genetic code table. To avoid this problem, naturally occurring amino acid residues as well as those structures not existing in protein were examined using L-isoleucine and various structural isomers as examples. The energy resulting from the interactions of these amino acids with codon/anticodon sites was measured. Isoleucine was chosen because in addition to its L- configuration, it has a chiral (asymmetric) side chain and exists as a single enantiomer when translated into protein. It was thus possible to examine the fit of various diastereoisomers. The rationale of varying the amino acid structure and quantitating interactions with a given codon/anticodon site directly addresses the above stated concerns of Crick.

Here, we report further evidence that the genetic code has a stereochemical basis. This conclusion is based upon the finding that L-isoleucine fits remarkably well into the apyrimidinic site derived from its codon/anticodon and the fit is highly specific as evidenced by the relatively poor fit of other structural isomers. The results of the study all of the existing amino acids and their fit into all 64 possible sites using this approach will be reported in subsequent papers.


Molecular modeling was conducted on a Indigo Extreme Silicon Graphics computer with Sybyl 6.04 software (Tripos Associates, St. Louis, MO) equipped with stereoviewing. All structures were constructed with the Biopolymer module of the Sybyl program and assigned Kollman all atom charges. Double stranded DNA triplets were constructed in the Watson and Crick canonical B form. Apurinic/apyrimidinic sites were formed by removing a middle nucleotide (base with deoxyribose and 5' phosphate) from the triplets. Energy calculations were performed with the Sybyl force field using a 1.2 parameter for the van der Waals radius of hydrogen. While keeping the relative position of the bases intact, adjustments were made to the torsional angles of the remaining backbone to permit maximal insertion of the size and shape of a given amino acid side chain into the site. Concomitantly, the electrostatic interaction between the negatively charged 5' -phosphate oxygen of the 3' base bordering the site and the positively charged alpha amino group of the amino acid (salt bridge) were optimized. Alterations were made in the various amino acid structures to create unnatural isomers which were then minimized using the force field.

Insertion of each candidate amino acid into DNA was accomplished using van der Waals surfaces including 1.0 solvent accessible Connolly surfaces of the apurinic/apyrimidinic cavities in stereo. Poor steric contacts were minimized using autodocking. The docking procedure was repeated several times to optimize the distances and directions of potential hydrogen bonds and salt bridges as well as to maximize van der Waals interactions. The conformations of the amino acids were also adjusted to maximize steric contact. The relative fit or complementarity of each amino acid was calculated by measuring the optimal favorable energy change resulting from docking. A convenient method was to perform the docking procedure, define the amino acid and DNA separately as aggregates and merge the molecules into a single complex. The change in van der Waals energy was used as a measure of steric complementarity; the change in electrostatic energy using donor hydrogens and acceptor heteroatoms was used to assess electrostatic complementarity. Thus, the greater the magnitude of the negative energy change resulting from complex formation, the more stable the complex and the better the fit. The total fit of each ligand was evaluated by adding the change in kcal of the electrostatic and van der Waals energies.

Before proceeding to the results, it should be noted that there are certain inherent limitations of the currently available computational methods. To date, it has not been possible to examine all other possible sites or conformations of DNA and/or RNA. Water surfaces need to be considered and should also play an important role in the specificity of fits of amino acids into the apurinic/apyrimidinic sites. The amino acid codon/anticodon complexes including solvent shells should also be examined with molecular dynamics.


The creation of an apyrimidinic site using computer modeling from double stranded DNA is depicted with both skeletal and space filling models in Figures 1 and 2, respectively. The site was created by removal of a center nucleotide in a triplet sequence. In the example shown, the specific double stranded sequence which was used to construct the cavity, i.e. 5' ATC 3', 3' TAG 5' , is a codon/anticodon triplet for isoleucine. Note that the cavity formed by removal of the nucleotide T is bordered by: the first codon base (A) along with the attached deoxyribose; the third codon base (C) with the attached deoxyribose and 5 phosphate group; the middle base of the anticodon (A). The orientation, van der Waals surface and electrostatic properties of these bases along with the sugar- phosphate backbone determine the overall shape and physicochemical characteristics of the cavity. The predominant electrostatic feature of any given cavity is the negatively charged phosphate oxygen of the third codon base. Thus, in order for a candidate molecule to fit well within the site according to the well established principles of complementarity (13), it should form a suitable charge interaction with the phosphate group as well as conform to the surface characteristics of the cavity. For this reason, when docking amino acids into the site, the positively charged alpha amino group was positioned in a manner which could form a salt bridge to the phosphate oxygen concomitant with maximizing the van der Waals interaction of the side chain within the cavity surface.

When attempts were made to dock L-isoleucine into the site 5' A_C 3', 3' TAG 5' , the side chain was found to be the same approximate size and shape as the cavity (Figures 1C, 2C). A complex 5' A-ILE-C 3', 3' TAG 5' could be constructed in which a salt bridge was formed with the isobutyl side chain fitting completely within the cavity. Moreover, the van der Waals surfaces of the middle anticodon base and the side chain possessed contacts which were strikingly complementary to one another. To better assess the shape of the cavity, a Connolly or solvent accessible surface was created using those atoms most closely bordering the site (Figure 3). L-isoleucine fits very well into the Connolly surface (Figure 3C and Figure 4A). When measured with energy calculations, L- isoleucine formed an electrostatic interaction (salt bridge) of -23.578 kcal and a van der Waals interaction of -14.233 kcal within the site. The total interaction energy in the 5' A-ILE-C 3', 3' TAG 5' complex was -37.811 kcal.

Attempts to dock other isomers of L-isoleucine into DNA were not as successful (Figure 4). In no case was it possible to fully insert these isomers into the cavity analogous to L-isoleucine without very high van der Waals repulsion of ca +550 kcal to > +8000 kcal. The best fits of these isomers in the site are as follows. L- alloisoleucine had a -22.257 kcal electrostatic interaction with a -11.496 kcal van der Waals interaction for a total of -33.753 kcal. As shown in Figure 4B, relatively little of the side chain was capable of contacting the surface of the cavity bordered by the base A of the anticodon. Changing the configuration of the alpha carbon to D-alloisoleucine also did not permit full insertion into the cavity. D-alloisoleucine had a -15.026 kcal electrostatic interaction and a - 8.368 kcal van der Waals interaction for a total of -23.394 kcal. Even less of the side chain of this epimer could fit within the site (Figure 4C). The worst fitting diastereoisomer was D-isoleucine which had a - 8.711 kcal electrostatic interaction and a -8.323 kcal van der Waals interaction for a total of - 17.034 kcal(Figure 4D). It was not possible to fit the L-tertiary butyl structural isomer of L-isoleucine into the site due to the bulkiness of the side chain(Figure 4E). The L-t- butylisoleucine isomer had a -9.260 kcal electrostatic interaction and a -7.436 kcal van der Waals interaction for a total of -16.696 kcal. The structural homolog of L- isoleucine (L-homoisoleucine) with a carbon added between the alpha carbon and isobutyl side chain was also a poor fit. It was not possible to form a reasonable salt bridge and the side chain could only be partially inserted into the site(Figure 4F). L- homoisoleucine had a -3.968 kcal electrostatic interaction with a -10.018 kcal van der Waals interaction for a total of -13.986 kcal.


Computer modeling including graphics and energy calculations have clearly shown that L-isoleucine is an excellent fit into the apyrimidinic site created by removal of the middle nucleotide (T) from the double stranded DNA sequence 5' ATC 3', 3' TAG 5'. L-isoleucine exhibits highly favorable interactions within the apyrimidinic DNA cavity 5' A_C 3', 3' TAG 5' demonstrated by simultaneous formation of: a strong electrostatic interaction manifest in a salt bridge between the alpha amino group of the amino acid and a phosphate oxygen of the third nucleotide base (C); complementary van der Waals surfaces between the side chain and the surrounding bases and in particular the unpaired base (A). These results confirm earlier studies based upon physical models which demonstrated that translated amino acids fit into apurinic/apyrimidinic sites (7, 10, 11). The apyrimidinic site in DNA chosen for this study was derived from a codon-anticodon triplet sequence for L-isoleucine. The remarkable complementarity manifest in the 5' A-ILE-C 3', 3' TAG 5' complex is consistent with our prior observations and supports the notion that the genetic code has a stereochemical basis.

As stated in the introduction, prior knowledge of the existence of the genetic code table can create an inherent bias in any study of the relationships between amino acids and their codons or anticodons. This investigation was specifically designed to avoid such bias using the rationale of Crick (8), namely, to examine amino acid structures which are not translated into protein. Using L-isoleucine as an example, all possible diastereoisomers, a structural isomer and a homolog were examined for fit into the apyrimidinic site derived from an isoleucine codon/anticodon. None of the structural variants fit into the site as well as L- isoleucine as summarized in energy calculations plotted in Figure 5. These results provide unequivocal and strong evidence that the fit of isoleucine into the cavity formed from its apyrimidinic codon/anticodons cannot be fortuitous.

We have recently reported evidence using similar computer modeling methodology that the amino acid L- tryptophan fits very well into apurinic sites derived from its codons/anticodons (14). The amino acid/nucleic acid complexes formed, e.g. 5' T-TRP-G 3', 3' ACC 5' and 5' T-TRP-A 3', 3' ACT 5' , revealed a high degree of specificity. Namely, poor fits were generally found when attempting to fit L-tryptophan into sites not derived from its codon/anticodons. For example, L-tryptophan would not fit into the site which accommodates L-isoleucine. Although not shown, L-isoleucine does not fit well into the L-tryptophan site. Computer modeling studies in progress demonstrate that in general amino acids not only fit into apurinic/apyrimidinic sites derived from their codons/anticodons but that these fits are highly specific when measured with energy calculations. It is also of interest that apurinic/apyrimidinic sites in RNA appear to accommodate the amino acids at least as well if not better than the respective sites in DNA.


It will undoubtedly take a long time to rigorously examine the fits of all translated amino acids as well as structurally altered isomers into all 64 possible apurinic/apyrimidinic sites. From the available data, it is reasonable to conclude that the genetic code has an underlying stereochemical rationale based upon the complementary stereochemical fits of amino acids into apurinic/apyrimidinic sites derived from their double stranded codon/anticodons triplets. The observation that alterations in the amino acid structures including changes in chirality from those translated into protein result in poor fits provides strong support for the stereochemical theory and suggests that the genetic code table evolved from direct interactions of nucleic acids with candidate amino acids. These findings may also have application in understanding the established conservation of amino acid residues in proteins which regulate certain genes as described by Harris et al.(12).

It should not be surprising that the original principles of the physicochemical complementarity of molecules and their predicted importance in biological function originally described by Pauling and Delbruck would be applicable to understanding biological coding. In fact, it would indeed be contradictory if Nature's current systems of biological regulation and transmission of genetic information did not follow these principles. We have proffered that such complementarity exists between nucleic acids and a wide variety of biologically active naturally occurring small molecules with nucleic acids (15). We also maintain that these complementary relationships represent a stereochemical logic inherent in gene structure which dictates constraints on biological structure, function, activity and metabolism (11).


We wish to thank others who have been involved in the studies on the stereochemical basis for the genetic code including Robert Ivarie, Douglas Ewing, John Henke, Francis Witham, Matt Petersheim, Kristin Douglas, Kerry Hendry, Bryan Hendry and Wendy Hendry. We also thank the Georgia Research Alliance for partial funding of computer hardware and software used in this study.


  1. Watson, J. and Crick, F. H. C. Molecular structure of nucleic acids. (1953) Nature 171, 737.

  2. Gamov, G. Possible relation between deoxyribonucleic acid and protein structures. (1954) Nature 173, 318.

  3. Nirenberg, N. W. and Matthaei, J. H. The dependence of cell-free protein synthesis in E. coli upon naturally occurring or synthetic polyribonucleotides. (1961) Proc. Natl. Acad. Sci. 47, 1588-1602.

  4. Lengyel, P. R., Speyer, J. R. and Ochoa, S. Synthetic polynucleotides and the amino acid code. (1961) Proc. Natl. Acad. Sci. 47, 1936-1942.

  5. Woese, C. R. The Genetic Code: The Molecular Basis For Genetic Expression. New York: Harper & Row, 1967.

  6. Lacey, J. C., Jr., Wickramasinghe, N. and Cook, G. W. Experimental studies on the origin of the genetic code and the process of protein synthesis: a review update. (1992) Origins of Life and Evolution of the Biosphere 22, 243-275. MEDLINE

  7. Hendry, L. B., Bransome, E. D., Jr., Hutson, M. S. and Campbell, L. K. A newly discovered stereochemical logic in the structure of DNA suggests that the genetic code is inevitable. (1984) Perspect. Biol. Med. 27, 623-651. MEDLINE

  8. Crick, F. H. C. The origin of the genetic code. (1968) J. Mol. Biol. 38, 367- 379. MEDLINE

  9. Hendry, L. B. and Witham, F. H. Stereochemical recognition in nucleic acid-amino acid interactions and its implications in biological coding: a model approach. (1979) Perspect. Biol. Med. 22, 333-345. MEDLINE

  10. Hendry, L. B., Bransome, E. D. Jr. and Petersheim, M. Are there structural analogies between amino acids and nucleic acids? (1981) Origins of Life 11, 203-221. MEDLINE

  11. Hendry, L. B., Bransome, E. D., Jr., Hutson, M. S. and Campbell, L. K. First approximation of a stereochemical rationale for the genetic code based on the topography and physicochemical properties of "cavities" constructed from models of DNA. (1981) Proc. Natl. Acad. Sci. 78, 7440-7444. MEDLINE

  12. Harris, L. F., Sullivan, M. R. and Hickok, D. F. Conservation of genetic information: a code for site- specific DNA recognition. (1993) Proc. Natl. Acad. Sci. 90, 5534-5538. MEDLINE

  13. Pauling, L. and Delbruck, M. The nature of the intermolecular forces operative in biological processes. (1940) Science 92, 77-79.

  14. Hendry, L. B., Chu, C. K., Mahesh, V. B., Bransome, E. D., Jr., Hutson, M. S. and Campbell, L. K. (1994) Compumed 1994 Congress Proceedings, in press.

  15. Hendry, L. B. Drug design with a new type of molecular modelling based on stereochemical complementarity to gene structure. (1993) J. Clin. Pharmacol. 33, 1173-1187. MEDLINE

© 1995 Epress Inc.