Abbott Northwestern Hospital Cancer Research Laboratory, Minneapolis MN, 55407
Correspondence should be addressed to: Lester F. Harris, PhD.
Email: mg90601@sk.msc.edu
Reprinted with permission of the publisher.
Keywords: DNA protein complexes, molecular dynamics simulations, site-specific recognition
| Title Page | Abstract | Introduction | Materials and Methods |
|---|---|---|---|
| Results and Discussion | Conclusions | Acknowledgements | References |
| Figure 1 | Figure 2 | Figure 3 | Figure 4 | Figure 5 |
|---|---|---|---|---|
| Figure 6 | Figure 7 | Figure 8 | Table I | Table II |
ABSTRACT
We investigated protein/DNA interactions, using molecular dynamics simulations computed in solvent, between the glucocorticoid receptor (GR) DNA binding domain (DBD) amino acids and DNA of a glucocorticoid receptor response element (GRE). We compared findings obtained from a fully solvated 80 Angstrom water droplet GR DBD/GRE model with those from a 10 Angstrom water layer GR DBD/GRE model. Hydrogen bonding interactions were monitored. In addition, van der Waals and electrostatic interaction energies were calculated. Molecular dynamics simulations from both models yielded similar findings; amino acids of the GR DBD DNA recognition helix formed both direct and water mediated hydrogen bonds at cognate codon/anticodon nucleotide base sites within the GRE right major groove halfsite. Likewise GR DBD amino acids in a beta strand structure adjacent to the DNA recognition helix formed both direct and water mediated hydrogen bonds at cognate codon/anticodon nucleotide base and backbone sites. We also investigated protein/DNA interactions with a 10 Angstrom water layer model consisting of the same GR DBD as above but with a predicted alpha helix attached to the carboxyl terminus of the GR DBD docked at the same GRE as above with additional flanking nucleotides. In this model, the interactions between amino acids of the DNA recognition helix and beta strand and nucleotides within the GRE right major groove halfsite were at cognate codon/anticodon nucleotide sites as found in the two models above. In addition, amino acids within the predicted alpha helix located on the carboxyl terminus of the GR DBD interacted at codon/anticodon nucleotide sites on the DNA backbone of the GRE flanking nucleotides. These interactions together induced breakage of Watson-Crick nucleotide base pairing hydrogen bonds, resulting in bending of the DNA, strand elongation and unwinding events similar to those described for helicases.
INTRODUCTION
In order for specific DNA transcription to occur, recognition and binding at specific sites on DNA by regulatory proteins is essential. Several nucleotide sequences of specific DNA binding sites which are involved in gene transcription regulation have been described (1-4), suggesting that a code for recognition between DNA regulatory proteins and cognate DNA binding sites exists. Considerable information regarding regulatory protein/DNA interaction has been gained from biological experimentation. In several of these systems, both prokaryotic and eukaryotic, a DNA recognition alpha helix structure within the protein's DNA binding domain (DBD) has been observed (1, 5-6). It is known that sequence specific DNA binding by regulatory proteins occurs as a result of multidentate hydrogen-bonding and van der Waals interactions between the DNA recognition helix amino acid sidechains and nucleotide base sites within the cognate operator or response element DNA major groove halfsites (1, 5, 7). However, the underlying mechanism by which DNA regulatory proteins recognize specific sites on DNA is the subject of debate (5, 8-11).
It was generally believed that a site specific DNA recognition code would be revealed once high resolution structures of protein/DNA complexes were determined by X-ray crystallography. Recently, X-ray crystallographic structures of protein/DNA complexes were determined and interactions between amino acids within recognition helices and nucleotides within specific DNA binding major groove half-sites reported (1, 12). However, a simple code which describes sequence-specific DNA recognition was not detected. In pursuit of investigating the underlying mechanism of DNA recognition, we have previously reported that a high degree of nucleotide similarity is conserved between nucleotide subsequences of the c-DNA which encode regulatory proteins' DNA recognition alpha helices and their cognate operators or hormone response elements on the DNA to which they specifically bind (5, 10-11). These findings allowed us to hypothesize that conservation of genetic information is a common determinate for site specific DNA recognition by DNA regulatory proteins. By applying our hypothesis and using computer based nucleotide similarity search techniques coupled with secondary structure prediction and model building, we located putative DNA recognition alpha helices for members of the steroid-thyroid hormone receptor protein family prior to the structural determination of the DBDs for these proteins (5). Furthermore, we extensively studied, by model building and molecular dynamics, the putative DNA recognition helix of the glucocorticoid receptor (GR) interacting with a GRE in which we reported amino acid/nucleotide atomic interactions (5). Our predictions were subsequently confirmed by the NMR (13) and X-ray crystallography (12) findings for the GR DBD interacting in complex with a GRE. Specifically these studies reported a DNA recognition helix for the GR; the amino acid sequence of the GR DNA recognition helix was 100% identical to the putative GR DNA recognition helix sequence we described earlier based on genetic information conservation (5). In addition, in the GR DBD/GRE co-crystal complex the orientation of the GR DNA recognition helix within the right DNA major groove halfsite of its cognate GRE binding site agreed with our earlier model (5).
Recently, using the genomic structure of the GR gene (14) as a guide, we compared separately nucleotide sequences of exons 3, 4, and 5 which encode the DBD of the GR protein with a nucleotide sequence upstream of the mouse mammary tumor virus transcription start site which contains known GRE sites (GenBank MMPRGR1). We reported therein, that genetic information was conserved between the DNA sequence of a well characterized GRE and its flanking nucleotides and the c-DNA encoding the GR DBD (11). The regions of nucleotide subsequence similarity to the GRE and its flanks within the GR DBD coding sequence occured specifically at nucleotide sequences on the ends of exons 3, 4, and 5 at their splice junction sites.
In the present study, using molecular dynamics simulations in solvent, we investigated protein/DNA interactions between the GR DBD amino acids and the GRE and its flanking nucleotides as described above (11). We compared findings from a fully solvated 80 Angstrom water droplet GR DBD/GRE model with those from 10 Angstrom water layer GR DBD/GRE models. Hydrogen bonding interactions were monitored. In addition, van der Waals and electrostatic interaction energies were calculated. The molecular dynamics show that the GR DBD amino acids of the DNA recognition helix encoded in exon 3 form both direct and water mediated hydrogen bonds at cognate codon/anticodon nucleotide base sites within the GRE right major groove halfsite. Likewise, amino acids of the beta strand encoded in exon 4 and amino acids of a predicted alpha helix encoded in exon 5 form hydrogen bonds with nucleotide base and DNA backbone sites at cognate codon/anticodon nucleotide sites within the GRE DNA major groove halfsites and GRE flanking regions, respectively. These interactions together induce breakage of Watson-Crick nucleotide base pairing hydrogen bonds, resulting in bending of the DNA, strand elongation and unwinding events similar to those described for helicases (15).
MATERIALS AND METHODS
Model Building
Two models of the GR DBD dimer were used in this study, both were derived from NMR atomic coordinates of the GR DBD (personal communication, Kaptein) (13). However, residues following Arg 510 in the NMR GR DBD structural determination were disordered, and no coordinates were reported. The amino acid sequence ranging from Arg 510 to Lys 517 contained a predicted alpha helix encoded by exon 5 which we reported earlier to have genetic similartity to the GRE flanking nucleotide regions (11). Therefore, in order to study potential interactions by amino acids of this predicted alpha helix and nucleotides flanking the GRE, it was necessary using the QUANTA program (16) from Molecular Simulations Inc, to create an alpha helix of the exon 5 encoded amino acids ranging from 511 to 517 and attach this structure to Arg 510. Models of the NMR determined GR DBD, without the exon 5 encoded alpha helix attached, were also used for a comparative study between GR DBD/GRE interactions in a 80 Angstrom water droplet and a 10 Angstrom water layer. Solvated molecular dynamics simulations of the NMR GR DBD/GRE models are described below.
All models of DNA were created using the NUCLEIC ACID BUILDER module from the QUANTA program (16). Two models of B-form DNA of the naturally occuring MMTV GRE from GENBANK locus MMTPRGR1 which shared genetic similarity with the c-DNA encoding the GR DBD (5,11) were created; a 17 base pair GRE model was used in the 80 Angstrom water droplet solvated NMR GR DBD/GRE model. For comparison, we also created a 10 Angstrom water layer NMR GR DBD/GRE model using the same 17 base pair GRE nucleotide sequence. Finally, we created a 10 Angstrom water layer GR DBD/GRE model using a 29 base pair DNA sequence of the same GRE with flanking nucleotides. We used this model to study GR DBD amino acid interactions as above, as well as those of the exon 5 encoded predicted alpha helix amino acids with nucleotides of the GRE flanks. We applied findings of GR DNA binding interactions from studies of DNAase protection (17-18), methylation inhibition (19), X-ray crystallography (12) and conservation of genetic information (5, 11) to orient and dock the GR DNA recognition helix relative to the DNA at the GRE sites.
Dynamics Parameters
The solvated molecular dynamics simulations were run on CRAY-2 and CRAY C-90 supercomputers using a specially optimized version of CHARMm (release version 21.3) which has an atom limit of 30,000. The supercomputer resources required for molecular dynamics computation of the 80 Angstrom water droplet GR DBD/GRE model were extremely demanding, requiring approximately 8.17 CRAY-2 CPU hours per picosecond of simulation. This resource demand was due to the number of water atoms, 22,608, needed to construct the 80 Angstrom water droplet of the fully solvated GR DBD/17 bp GRE model which consisted of 2,184 atoms for a total of 24,892 atoms. We also constructed a model of GR DBD/17 bp GRE in a 10 Angstrom water layer which reduced the number of water atoms to 5328 for a total of 7512 atoms for the solvated complex. This model required approximately 0.39 CRAY C-90 CPU hours per picosecond of simulation (equivalent to 1.31 CRAY-2 CPU hours per picosecond). In order to study interactions of nucleotides flanking the GRE DNA major groove halfsites and amino acids of the predicted alpha helix on the carboxyl end of the GR protein's DNA binding domain as described above and elsewhere (11), we constructed a larger GR DBD/29 bp GRE complex which consisted of 2908 atoms. A 10 Angstrom water layer to solvate this model required 6717 water atoms, resulting in a model of 9625 total atoms. This model required approximately 0.60 CRAY C- 90 CPU hours of computational resources per picosecond of simulation (equivalent to 2.01 CRAY- 2 CPU hours per picosecond).
The solvated models were minimized for 200 cycles using the Steepest Descents method. Then the structures were minimized for 100 cycles using the Adopted Basis Newton Rapson method. Heating was run for 600 cycles, at 0.001 picoseconds per cycle for a total of 0.6 picoseconds, resulting in 0.5o K temperature increase per cycle (from 0 to 300 degrees K). Equilibration was run for 1000 cycles (1 picosecond) resulting in an overall temperature RMS deviation of approximately 3 degrees K. Finally molecular dynamics were run with a step size of 0.001 picoseconds for an additional 30 picoseconds (30,000 cycles) for the 80 Angstrom water droplet GR DBD/17 bp GRE and 10 Angstrom water layer GR DBD/17 bp GRE models and 300 picoseconds (300,000 cycles) for the 10 Angstrom water layer GR DBD/29 bp GRE model using velocity scaling. A constant dielectric potential with an E value of 1.00 was used in the solvated molecular dynamics simulations. A non-bonded cutoff of 15.00 angstroms was used in all simulations. Non-bonded parameters were updated every 20 cycles and all energy terms were computed. For a detailed discussion of the CHARMm potential energy function see reference (20) and for a review of molecular dynamics implementation in the biological sciences see reference (21).
Explicit sodium counter-ions in the DNA models were used in all simulations, based on geometry provided by Don Gregory Ph.D. from Molecular Simulations Inc. Zinc atoms were placed in the GR structure and tetrahedrally coordinated with the sulphur atoms from the "zinc- finger" cysteines. The residue topology file (RTF) for the "zinc-finger" cysteines was altered and a new residue type was created 'ZCY' (for zinc binding cysteine) in which the negative charges on the sulphur atoms were increased from -0.19 to -0.50 so that the charges from the four tetrahedrally coordinated cysteine sulphur atoms would neutralize the +2.0 charge on the zinc atom. In addition, the charges on the zinc binding cysteine beta carbons were increased from +0.19 to +0.40 and the charges on the alpha carbons were increased from +0.10 to +0.20 in order to maintain the ZCY residue at a net 0.0 charge.
DNA Groove Geometry Calculations
The conformational changes of the DNA during dynamics were evaluated using the CURVES 4.1 program provided by Richard Lavery of Laboratoire de Biochimie Theorique CNRS (personal communication). The documentation provided describes CURVES as "an algorithm for calculating a helical parameter description for any irregular nucleic acid segment with respect to an optimal, global helical axis. The solution is obtained by minimizing a function which represents the variations in helical parameters between successive nucleotides as well as quantifying the kinks and dislocations which exist between successive helical axis segments." For more detailed information regarding the CURVES 4.1 program see references (22-23).
Interaction Energy Calculations
Graphs of interaction energy between GRE nucleotides and selected GR DBD amino acids were calculated using CHARMm (20-21). In all graphs, interaction energy was calculated using a constant dielectric potential with an E value of 1.00. "Total Energy" is the sum of electrostatic interaction energy and Van der Waals interaction energy. The values given for the interaction of particular amino acid and nucleotide residues are the sum of the interaction energies of all atoms in those residues.
Hydrogen Bond Calculations
The hydrogen bond interactions between amino acids/nucleotides within the 80 Angstrom water droplet GR DBD/17 bp GRE model were recorded at 0.1 picosecond intervals, 300 steps total. The hydrogen bond interactions between amino acids/nucleotides within the 10 Angstrom water layer GR DBD/29 bp GRE model were recorded at 1.0 picosecond intervals, 300 steps total. Equivalent functional sites on the amino acids are grouped. For example, using standard IUPAC atom nomenclature, lysine hydrogen bond donor sites HZ1, HZ2 and HZ3 are combined as HZ. Arginine hydrogen bond donor sites HH11 and HH12 are combined as HH1 and hydrogen bond donor sites HH21 and HH22 are combined as HH2. Glutamine hydrogen bond donor sites HE21 and HE22 are combined as HE2. Asparagine hydrogen bond donor sites HD21 and HD22 are combined as HD2. Glutamic acid hydrogen bond acceptor sites OE1 and OE2 are combined as OE. Likewise, the DNA backbone phosphate group hydrogen bond acceptor sites O1P and O2P are combined as OP. First and last occurences of specific amino acid/nucleotide hydrogen bonds during the 30 or 300 picosecond production dynamics simulation were monitored at 0.1 and 1.0 picosecond intervals of dynamics, respectively, for each amino acid/nucleotide H-bond interaction. Frequencies of H-bonding interactions over the dynamics simulations greater than 300 reflect multiple hydrogen bonds (i.e. when two or more of the grouped atoms, as described above, from one residue interact at the same atom from another residue, bidentate H-bonds) for a given amino acid/nucleotide interaction. The hydrogen bonding interactions between amino acids encoded by exons 3, 4 and 5 of the GR DBD and nucleotides of the GRE and flanking regions were monitored. We used a distance-angle algorithm to compute hydrogen bonds which was based on the results of analysis of hydrogen bonding in proteins (24). The value used for the maximum distance allowed between the hydrogen atom and the acceptor was 2.5 angstroms. The value used for the maximum distance allowed between the atom bearing the hydrogen and the acceptor was 3.3 angstroms. The minimum angle at the acceptor was 90 degrees (limit = 0 to 180 degrees). The minimum angle at the hydrogen was 90 degrees (limit = 0 to 180 degrees). The minimum angle at the atom bearing the hydrogen was 90 degrees (limit = 0 to 180 degrees).
RESULTS AND DISCUSSION
Molecular Dynamics
Conformational changes have been reported to occur in DNA structures as the result of prokaryotic and eukaryotic DNA regulatory proteins' binding to their cognate DNA sites (1, 25- 27). These observations indicate that interactions between amino acids of the regulatory proteins' DNA recognition helices and nucleotides of their cognate DNA binding sites are dynamic events. Consequently, a determination of recognition events leading to protein/DNA complex formation requires that the motions of individual atoms in the complex be monitored as a function of time. Molecular dynamics simulations have been used for the study of dynamic biological phenomena and for refinement of macromolecular structures based on NMR and X-ray crystallography (20, 21).
Recently, we reported that nucleotide subsequence similarity exists between a well characterized GRE and its flanking nucleotides and the c-DNA which encodes amino acids of the GR DBD (11). A summary of these findings, reprinted from PNAS with permission, is shown in figure 1a-c. By model building we also observed that GR DBD amino acids encoded at the splice junctions of exons 3, 4 and 5 are aligned with their cognate codon/anticodon nucleotides within the GRE right and left DNA major groove halfsites and flanks. This included amino acids of the GR DNA recognition helix encoded in exon 3, a beta strand encoded in exon 4, and amino acids of a predicted alpha helix encoded in exon 5 at the exon 4 and 5 splice junction site (see figure 1c).
These findings suggested that the amino acids within the above structures may interact with their cognate codon/anticodon nucleotides within the GRE and its flanks. To investigate this possibility, we docked the GR DBD dimer at H-bonding distance within the DNA major groove halfsites of the GRE. Using the CHARMm program, we conducted 30 picoseconds of molecular dynamics for a fully solvated 80 Angstrom water droplet model of the NMR GR DBD structure, docked at the 17 bp nucleotide sequence containing the GRE which shares nucleotide subsequence similarity to the c-DNA encoding the GR DBD amino acids, as shown in figure 1. We also conducted 30 picoseconds of molecular dynamics on a 10 Angstrom water layer model of the NMR derived GR DBD structure docked at the same 17 bp GRE nucleotide sequence. A comparison of both direct and water mediated GR DBD amino acid sidechain-GRE nucleotide interactions between the two models revealed similar results, data not shown. These findings supported the validity of using a 10 Angstrom water layer model to study GR DBD amino acid- GRE nucleotide interactions during molecular dynamics. Therefore, we constructed a 10 Angstrom water layer model of the NMR derived GR DBD structure, with the exon 5 encoded predicted alpha helix attached, docked at a 29 bp GRE and flanking nucleotides. A GR DBD/ 29 bp GRE model, without the water molecules, is shown in figure 2. In this model the GR DBD is docked at about 10 Angstroms from the 29 bp GRE and flanking nucleotides for visual clarity. This model is to be used as a key for locating interactions found between the GR DBD amino acids and nucleotides of the GRE and its flanks during molecular dynamics.
In preparation for molecular dynamics, we docked the GR DBD dimer at H-bonding distance within the DNA major groove halfsites of the 29 bp GRE sequence; we placed the complex in a 10 Angstrom water layer and conducted 300 picoseconds of molecular dynamics. A structural comparison of both the 80 Angstrom water droplet GR DBD/17 bp GRE and the 10 Angstrom water layer solvated GR DBD/29 bp GRE models is shown in figures 3a- f. The solvated models, prior to molecular dynamics, are shown in figures 3a and 3b. The same models are shown in figures 3c and 3d after 30 and 300 picoseconds of molecular dynamics, respectively, with water molecules removed for clarity; white dotted lines represent hydrogen bonds. In both models, the DNA appears to wrap around the GR DBD DNA recognition alpha helices. In addition, in the 10 Angstrom water layer GR DBD/29 bp GRE model, nucleotides flanking the GRE major groove halfsites are drawn into amino acids of the exon 5 encoded predicted alpha helix (figure 3d and figure 2 for reference). The GR DBD/GRE models are shown in figures 3e and 3f after 30 and 300 picoseconds of dynamics with the protein backed off at a distance of about 10 angstroms from the DNA in these models for visual clarity. Just those water molecules which are involved in hydrogen bonding interactions between amino acids and nucleotides during molecular dynamics are shown at the protein/DNA interface.
To further illustrate the geometric changes in the GRE DNA, as the result of interacting with the GR DBD, using the CURVES program (22), GRE DNA major and minor groove width was analyzed after 30 and 300 picoseconds of molecular dynamics and compared to GRE DNA at zero picoseconds for both GR DBD/GRE models; the results are shown in figure 3g and 3h. The DNA major and minor groove widths at zero picoseconds were determined to be 11.4 and 5.6 Angstroms respectively, in close agreement with values reported for canonical B-DNA duplexes (23). These values were used as a baseline to monitor changes in DNA major and minor groove width during molecular dynamics. Both models showed an increase in width in the GRE DNA right major groove halfsite. This observation is in agreement with results from GR DBD/GRE co- crystal findings (12). In the present study, a decrease in minor groove width between the two major groove halfsites was observed in both models, figures 3g and 3h. In support of these observations, similar findings have also been reported for certain prokaryotic DNA regulatory protein/DNA complexes (23). In addition, nucleotides of the GRE flanking regions in the 10 Angstrom water layer GR DBD/ 29 bp GRE model showed a marked decrease in minor groove width within the poly A/T sequence of the flanking DNA reflecting bending into the GR protein, see figure 3h. Similar findings of DNA bending have been reported for other DNA regulatory protein/DNA complexes (25-27).
Electrostatic and van der Waals interactions
We calculated van der Waals and electrostatic interaction energies between amino acids of the GR DBD and nucleotides on the sense and antisense strands of the GRE and its flanks. Since the GR DBD preferentially and specifically binds nucleotides of the DNA right major groove halfsite of the GRE as a monomer, interactions found with amino acids of the DNA recognition helix in the right protein monomer are shown. Calculations were performed on the minimized, heated and equilibrated structures in order to analyze the attractive forces between GR DBD DNA recognition helix amino acids and GRE nucleotides at the begining of the dynamics simulation. A total energy (Kcal/M) interaction consisting of both van der Waals and electrostatic energy was determined. Total energy values were recorded for the hydrophilic amino acids of the GR DNA recognition helix and nucleotide base pairs within the GRE DNA right major groove halfsite. The maximal attractive energy potential for Lys 461, Lys 465, and Glu 469 sidechains was with their cognate codon or anticodon nucleotide base pairs found within 5'-AAGAA-3'-5'-TTCTT-3', a palindromic sequence which has codons for Lys (AAG) and Glu (GAA) in both directions, 5'-to-3' and 3'-to- 5', in the GRE DNA right major groove halfsite as shown in figures 4 a,c,e. In addition, Val 462 showed a strong van der Waals interaction at the middle nucleotide of its codon GTT (see figure 4b). The van der Waals interaction of Val 462 at the middle nucleotide of its codon site in the right major groove halfsite agrees with our original prediction for this amino acid (5) which was recently confirmed by the findings of Luisi et al. (12). The maximal attractive energy potential for Arg 466 was not directed toward its codon/anticodon nucleotide base pair (see figure 4d). However, early in the molecular dynamics Arg 466 showed strong attractive energy potential for its codon nucleotide A39, data not shown. This attraction is well illustrated by the H-bonding of Arg 466 with its codon nucleotide A39 as shown in tables 1 and 2. Our results show global electrostatic attraction for GR DNA recognition helix amino acids toward their cognate codon/anticodon nucleotides within the GRE right major groove halfsite. In addition, Gln 471 of the exon 4 encoded beta strand has maximal attractive energy potential for its anticodon nucleotides on the sense strand, T19, GTT, reading 3'-to- 5', and codon nucleotides on the antisense strand, C21, CAG, reading 5'-to-3', see figure 4f.
Hydrogen Bonding Interactions
Hydrogen bonding interactions between amino acids encoded by exons 3, 4 , and 5 of the NMR GR DBD and nucleotides of the GRE and flanking regions were also monitored (see materials and methods). A summary of the H-bonding interactions occuring during the molecular dynamics for the 80 Angstrom water droplet GR DBD/ 17 bp GRE model is shown in table 1. A summary of H-bonding interactions for the 10 Angstrom water layer GR DBD/ 29 bp GRE model is shown in table 2. The conserved residues His 451 and Tyr 452 of the steroid receptor family are located in the loop of the first "zinc finger" (5). These amino acids form H-bonds at nucleotide base and DNA backbone sites on the outer margin of the left and right major groove halfsites of the GRE, data not shown, as well as with the sidechain of amino acid Lys 461, see figure 5a. Likewise, the amino acids Gln 471, Asn 473, Tyr 474 and Leu 475 of the beta strand structure encoded by exon 4 on the amino terminus of the second "zinc finger" adjacent to the DNA recognition helix make H-bond contacts at nucleotide base and DNA backbone at codon/anticodon sites on the inner margin of the right and left major groove halfsites (see tables 1 and 2). These interactions contribute to alignment of the DNA recognition helix amino acids within the GRE major groove halfsites. Interestingly, similar DNA backbone H-bond interactions by amino acids in structures adjacent to DNA recognition helices are seen among the prokaryotic DNA regulatory proteins. These backbone interactions are reported to contribute to the protein's DNA binding affinity and to position the DNA recognition helices within their cognate operator DNA major groove halfsites (1).
Specific Amino Acid-Nucleotide Interactions
Amino acids Lys 461, Lys 465 and Arg 466 of the GR DNA recognition helix are conserved at similar positions within the DNA recognition helices of the steroid receptor family (5). We observed that amino acids Lys 461, Lys 465 and Arg 466 of the GR DNA recognition helix form both direct and water mediated multidentate H-bonds at cognate codon/anticodon nucleotide base sites within the GRE right major groove halfsite, as shown in tables 1 and 2 (see figures 1 and 2 for reference). Close up views of specific amino acid-nucleotide H-bonding interactions are shown in figure 5a-d. In figure 5a, a direct H- bond between the sidechain of Lys 465 and its codon nucleotide G38 at the 06 Watson-Crick (WC) site is shown; H-bonding between Lys 465 and T20 (anticodon 3'-to-5') at the 04 WC site is also shown. These H-bonding observations agree with the total energy potential of attraction found for Lys 465 and nucleotides G38 and T20 (see figure 4c). Also shown in figure 5a are amino acid Lys 461 H-bonding interactions. Lysine 461 forms direct H-bonds with its codon nucleotide G38 at the N7 and 06 base sites. Lysine 461 also forms a H-bond with the N7 base site of its codon nucleotide A37, as well as, forming H-bonds with the sidechains of amino acids His 451 and Tyr 452. These interactions, in concert, disrupt the WC H- bonds between C21-G38 as can be seen. It is interesting to note that methylation of G38 has been reported to inhibit site specific DNA binding by the GR protein (19), see figure 2. In figure 5b, H- bonding interactions are shown for Arg 466; direct H-bonds are formed between the Arg 466 sidechain and its codon nucleotide A39 at the N7 base site and at the non-codon T19 at the O4 WC site. A water mediated H-bond also forms between Arg 466 and the phosphate backbone of A39 and with the sidechain of Glu 469. Glutamic acid 469 also forms a water mediated H-bond with its codon nucleotide A39 at the phosphate backbone. Additional amino acids associated with recognition of the GRE by the GR reportedly occur at positions Gly 458, Ser 459 and Val 462 (4, 28). Our molecular dynamic simulations indicate that Val 462 has van der Waals interaction with it's codon nucleotide T19 within the GRE right major groove halfsite as described above (see figure 4b and 5b). We also observed that Ser 459 briefly forms a water mediated H-bond with it's codon nucleotide T20, TCT, (see table 2 and figure 2). It is significant that the nucleotides which differentiate regulatory induction specificity between the GRE and the estrogen response element (ERE) are found in the third and fourth nucleotide base pairs of the TGTTCT recognition motif (29) corresponding to nucleotide base pairs T19-A40 and T20-A39 in the right major groove halfsite of the GRE, see figure 2 for reference. Amino acid-nucleotide interactions for Arg 466, Lys 465 and Glu 469 also occur at the T19-A40 and T20-A39 sites during molecular dynamics; these interactions are shown in figures 5a, and b and are summarized in tables 1 and 2. As the molecular dynamics proceed, a loss of Watson-Crick (WC) H-bonds and nucleotide base pairing occurs at these sites. Within the right major groove halfsite, methylation of G18 has also been reported to inhibit site specific DNA binding by the GR protein (19). Our results show that amino acids, Gln 471 and Asn 473, encoded in exon 4 at the splice junction site of exons 3 and 4 form both direct and water mediated H-bonds with nucleotide base sites O6 and N7 respectively on their anticodon nucleotide G18, see tables 1 and 2 and figure 5c. The water mediated H-bonding interactions of Leu 475 and its codon nucleotide T17 are also shown in figure 5c, and tables 1 and 2.
Hydrogen bonding interactions between exon 3 encoded amino acids of the left GR DBD monomer of the dimer and nucleotides of the GRE DNA left major groove halfsite involve the same amino acids as seen in the right GR DBD monomer and occur at equivalent dyad symmetrical nucleotide positions as in the GRE DNA right major groove halfsite. However, the wild type GRE DNA major groove halfsites consist of an imperfect palindrome of the 5' TGTTCT 3' recognition sequence which occurs in the right major groove halfsite; the sequence 5' TGTAAC 3' occurs in the left major groove halfsite, see figure 2. Therefore codon/anticodon nucleotide sites for Lys 461, Lys 465, Arg 466 and Glu 469 are not present in the GRE left major groove halfsite (see figure 1) and interactions occur at non-codon nucleotide sites (see tables 1 and 2). The GR DBD is reported to preferentially and specifically bind to the GRE right major groove halfsite containing the TGTTCT recognition sequence as a monomer which in turn facilitates cooperative dimerization and subsequent non specific interaction with nucleotides of the adjacent left major groove halfsite (30). However, methylation of G47 in the left major groove halfsite inhibits site specific DNA binding by the GR protein dimer (19). Our results clearly show that Gln 471 and Tyr 474 form both direct and water mediated H-bonds with their codon/anticodon nucleotide base pair C12-G47, see tables 1 and 2. This observation along with the specific atomic interactions which take place between the GR DNA recognition helix amino acids and their cognate codon/anticodon nucleotide bases within the GRE right major groove halfsite as described above, see tables 1 and 2 and figures 4 a-f and 5 a-c, supports our hypothesis that conservation of genetic information is a determinate of site specific DNA recognition and binding (5, 10-11). Furthermore, the atomic interactions by the amino acids of the exon 4 encoded beta strand at their conserved codon /anticodon nucleotide sites in the left major groove halfsite ( see tables 1 and 2 and figures 1 and 2) offer an explanation for the DNA binding preference reported for the GR at this particular GRE site (31) as opposed to the other GRE sites available in the LTR upstream of the MMTV gene initiation site.
Lys 461 is an unique amino acid of the GR DNA recognition helix since point mutation experiments converting Lys 461 to glycine results in loss of transcription stimulation by the GR protein while DNA binding affinity and specificity for the GRE are not affected (32). This finding suggests that Lys 461 may be involved in a secondary event, following the primary DNA binding interaction, which is essential for DNA transcription stimulation. Additionally, it has been reported that GRE flanking nucleotide sequences are required for efficient binding of the GR protein to the GRE (33). Our findings indicate that Lys 461 interacts at base sites on its codon nucleotides A36 and A37 in the right GRE major groove halfsite (see tables 1 and 2 ). As the molecular dynamics proceed, the amino acid-nucleotide H-bonding interactions of Lys 461 in concert with Tyr 452 and His 451 as described above, see figure 5a, along with exon 4 encoded amino acids Asn 506 and Leu 507 (see table 2) serve to initiate DNA structural changes which draw the DNA of the GRE flanking regions within H-bonding distance of the exon 5 encoded amino acids 510-517 of the predicted alpha helix on the carboxyl flank of the DBD (see figures 1 and 2 and table 2). We report herein that amino acids Arg 510, Lys 513 and Lys 517 of the predicted alpha helix within the GR right monomer form both direct and water mediated H-bonds on the DNA backbone at codon/anticodon sites of the GRE DNA right major groove halfsite and flanking nucleotide region, see table 2 and figure 5d. These interactions together induce DNA bending, strand separation and unwinding events which are associated with DNA transcription.
Role of Amino Acid-Amino Acid Interactions Within the DNA Recognition Helix
Recently the atomic coordinates of the X-ray crystallography derived GR DBD/GRE model (12) became available from Brookhaven Protein Data Bank. Upon examination of the DNA recognition helix amino acids from this model, we observed that the sidechain acceptor sites of Glu 469 form H-bonds with the donor sites on the Lys 465 sidechain thus orienting the Lys 465 sidechain away from the DNA. This observation explains why Lys 465 amino acid sidechain- nucleotide interactions are not reported in the X-ray GR DBD/GRE model (12), although Lys 465 has been shown by point mutation studies to be involved in both specific DNA binding and transcription activation by the GR protein (32). In contrast, the sidechain of Lys 465 is not constrained by Glu 469 in the NMR GR DBD model described above and Lys 465 interacts with its cognate codon/anticodon nucleotide base sites as shown in figures 4c and 5a and tables 1 and 2. Furthermore, in the model derived from NMR atomic coordinates, the Glu 469 carboxyl oxygen forms H-bonds with Arg 466 sidechain donor sites thus orienting the Arg 466 sidechain toward its cognate codon nucleotide A39 (see figure 5b). However, Arg 466 also interacts with G18 as observed in the X-ray GR DBD/GRE model (see tables 1 and 2).
The differences in amino acid-nucleotide interactions between the NMR GR DBD/GRE model described above and the X-ray GR DBD/GRE model (12) may also be due in part to the differences in GRE nucleotide sequence and major groove half site spacing of the GREs used in the respective models. In fact, as reported for the X-ray crystallography determined 434 cI repressor in complex with operator DNA (1), amino acid sidechain orientation was shown to differ at operators as a function of nucleotide sequence. In contrast to the 29 bp naturally occuring GRE and flanking nucleotides used in the NMR GR DBD/GRE model, the GRE used in the X-ray crystallographic determined GR DBD/GRE complex structure was an 18 bp synthetic oligonucleotide containing the TGTTCT recognition motif in both the right and left DNA major groove halfsites abnormally spaced by 4 nucleotides (12), (see figures 6a,b,c).
Conformational changes in DNA structure resulting from protein/DNA interaction have also been associated with nucleotide makeup and sequence length of the DNA (1, 26, 34). We observed that considerable DNA conformational changes occur in the 29 bp GRE nucleotide sequence during molecular dynamics with the NMR GR DBD/29 bp GRE complex see figures 3b, d, f, and h. In contrast, it was reported that there were no significant structural changes observed in the 18 bp DNA sequence in the X-ray crystallography determined GR DBD/GRE complex (12). In the NMR GR DBD/29 bp GRE model presented herein, a polyadenine sequence susceptible to DNA bending is found flanking the GRE major groove halfsites. We observed that bending of the GRE flanking nucleotide DNA occurred in part as a result of interaction with amino acids 510-517 of the predicted alpha helix encoded in exon 5. Nucleotide strands of the minor grooves flanking the GRE are drawn into the body of the GR DBD, the sense strand on the right and the antisense strand on the left which share genetic similarity with amino acids of the exon 5 encoded predicted alpha helix (see figures 1 and table 2 ). These reactions for the NMR GR DBD/29 bp GRE model parallel exactly the GRE flanking nucleotide binding results for the GR protein as determined by nuclease footprinting, Scheidereit et al. (18-19), see figure 2.
Differences are reported to exist between the proteins of the X-ray GR DBD and NMR GR DBD models (12-13); however, the alignment of the GR DBD DNA recognition helix within the GRE right major groove halfsite of our NMR GR DBD/GRE models as described above compared to that of the X-ray crystallography GR DBD/GRE model (12) was essentially identical. Similar structural differences were recently reported for the estrogen receptor (ER) protein DBD interacting with an estrogen response element, ERE, nucleotide sequence. X-ray crystallography determination of the ER DBD/ERE complex (35) revealed two structures of the ER DBD interacting with the DNA, one similar to the structure seen in the X-ray determined structure of the GR DBD/GRE complex (12) and one similar to the NMR determined ER DBD structure alone (36). The latter of which agrees with the NMR determined GR DBD structure alone (13) and accordingly resembles our NMR GR DBD models used to study GR DBD/GRE interactions as described above. Furthermore, although the ER DBD structures differed, they presented the ER DNA recognition helix amino acid sidechains to the ERE DNA major groove halfsites in identical orientation and alignment (35).
CONCLUSIONS
Site specific DNA recognition by the GR protein has been reported to occur within the GRE major groove halfsite containing the conserved TGTTCT motif (17-19, 31). In addition, a family of octanucleotides related to the sequence AAGAACAG have also been detected in GRE sites in the long terminal repeat (LTR) DNA upstream of the mouse mammary tumor virus (MMTV) transcription start site (17). Earlier we reported that the c-DNA which encodes the GR DNA recognition helix has a high degree of nucleotide similarity with a GRE site within the MMTV LTR (5). This GRE site contains the sequence 5'-AAGAACAG-3' 5'-CTGTTCTT-3' within its right major groove halfsite. Recently we found a high degree of genetic similarity between this same GRE site and the c-DNA encoding a beta strand structure adjacent to the DNA recognition helix (11) see figure 1. Upon examination of the amino acid coding possibilities of this sequence, in all reading frames on both strands, we found that codon sites for amino acids of the GR DNA recognition helix and beta strand were conserved (see figure1c). It is significant that the codon sites are found embedded in overlapping reading frames within pentanucleotides reading 5'-to-3' and 3'-to-5'. For example, on the sense strand in the pentamer CTGTT, valine codon sites are found, GTC reading 3'-to-5' and GTT 5'-to-3'. Likewise, on the sense strand within the pentamer TTCTT, phenylalanine codons sites are found, TTC, reading 5'-to-3' and 3'-to-5'. Similarly, on the antisense strand codon sites for amino acids arginine, AGA, lysine, AAG, and glutamic acid,GAA are found in the pentamer AAGAA in overlapping frames reading 5'-to-3' and 3'-to-5'. Likewise, on the antisense strand, codon sites for glutamine are found in the pentamer AACAG, CAA, reading 3'-to-5' and CAG, 5'-to-3'. We also found by model building of the GR DNA recognition helix docked at this GRE site that the sidechains of amino acids: Lys 461,Val 462, Phe 464, Lys 465, Arg 466 and Glu 469 were aligned with their cognate codon nucleotides within the GRE DNA right major groove halfsite (5). Likewise, amino acids of the beta strand adjacent to the DNA recognition helix had amino acids aligned with their cognate codons (11). Our findings, reported herein, show that amino acids Lys 461, Lys 465, and Arg 466 of the GR DNA recognition helix and Gln 471 of the beta strand encoded at the splice junction site of exons 3 and 4 adjacent to the GR DNA recognition helix specifically form both direct and water mediated bidentate H-bonds at their cognate codon/anticodon nucleotide base sites within the GRE right major groove halfsite which contains the 5'-CTGTTCTT-3' -5'-AAGAACAG-3' recognition motif (see figure 5a-d, and tables 1 and 2). In addition, Val 462 interacts by van der Waals with the middle nucleotide of its codon, GTT, and Glu 469 has strong electrostactic attraction and interacts by water mediated H-bonds to the phosphate backbone of its codon nucleotide, GAA (see figures 4b,e and 5b and tables 1 and 2). Phenylalanine 464 forms a water mediated H-bond with the phosphate backbone of its anticodon, (see table 1). Therefore recognition of codon-anticodon nucleotides within the GRE DNA right major groove halfsite by amino acids of the GR DNA recognition helix offers an explanation for the GR DNA binding preference to the GRE major groove halfsite which contains the TGTTCT motif. Our findings also indicate that GR site specific DNA recognition involves overlapping reading frames. These observations offer an explanation as to why more than one amino acid can interact with the same nucleotide and vise versa and still satisfy site specific DNA recognition according to our hypothesis.
Unlike the TGTTCT recognition motif which is conserved within the right major groove halfsite of GRE's, the nucleotide sequences of the GRE flanking regions are not conserved (33). However, we detected a high degree of nucleotide subsequence similarity between both flanks of a GRE and a nucleotide subsequence of the GR DBD exon 5 which encodes a predicted alpha helix (11), see figure 1. It is significant that this same GRE site, among the several located within the LTR nucleotide sequence upstream of the transcription start of the MMTV gene, has been reported to preferrentially bind GR and have the highest transcription enhancing activity (31). Therefore, our findings indicate that conservation of genetic information (5, 11) and the corresponding atomic interactions of amino acid sidechains of the GR DBD DNA recognition helix, beta strand and predicted alpha helix with cognate codon/anticodon nucleotides within a well characterized GRE and its flanking DNA sequence as reported herein are correlated with both DNA site specific recognition and transcription enhancement.
Using genetic similarity search and secondary structure prediction techniques, we were successful in being the first to locate and describe the GR DNA recognition helix (5, 11). Recently, using the same techniques, we located a putative DNA binding alpha helix on the carboxyl end of the GR DBD which has a high degree of genetic similarity with the flanking nucleotide regions of the GRE (11). We report herein that, during molecular dynamics, amino acids 510-517 of the predicted alpha helix encoded by exon 5 on the carboxyl teminus of the GR DBD interact with GRE flanking nucleotides at cognate codon/anticodon sites and induce DNA bending and unwinding. It is noteworthy that the GR DBD amino acids ranging from 510-517 are reported to be partially responsible for nuclear localization of the GR protein and are related in sequence to the nuclear localization signal of the simian virus, SV-40 T-antigen (37). Therefore, our observations indicate that in addition to nuclear localization, these GR amino acids, 510-517, may also be important in secondary DNA binding events involved in transcription stimulation.
The results of molecular dynamics, described herein, indicate that the GR DBD interacts with its naturally occuring cognate GRE and flanking nucleotides resulting in DNA bending, strand separation, elongation and unwinding, events which have been associated with helicase activity and transcription initiation (15). Bending of DNA has recently been observed for other members of the steroid/thyroid receptor protein superfamily as a result of estrogen receptor and thyroid receptor interaction at their cognate response elements (26-27). Furthermore, bending of the DNA in our model is consistant with the observations that the GR DBD dimer interacts with four turns of DNA at the GRE with amino acids located on the carboxyl end of the GR DBD interacting with the GRE flanking nucleotides at DNA backbone sites as described by others (33).
Our findings described herein and elsewhere (5, 10-11) strongly support the idea of a stereochemical basis for the origin of the genetic code (38-39) because amino acids within regulatory proteins' DNA recognition helices are consistantly found oriented toward and lining up with cognate codon-anticodon nucleotides within the specific DNA binding major groove halfsites. These findings also suggest that these structures may have been template dependent in their evolution (i.e. peptides acting as templates for nucleotide polymerization or vice-versa (40-41). Our observations that genetic sequence similarity is conserved between the GRE and its flanking nucleotides and nucleotide subsequences at the splice junction sites of exons 3, 4 and 5 which encode the DNA recognition helix, beta strand and predicted alpha helix, respectively, of the GR DBD implies that these structures are primordial recognition modules which have been conserved. The modular makeup of the DNA at the GRE recognition site also supports this belief; the first amino acid of the beta strand adjacent to the DNA recognition helix encoded at the splice junction site of exons 3 and 4, Gln 471, is aligned with the middle nucleotide base pair of the pentamer, 5'- AACAG-3'- 5'-CTGTT-3' within the right major groove halfsite of the GRE which contains both of the Gln codons on the antisense strand reading 5' to 3' and 3' to 5' from C (ie. CAG and CAA). Glutamine 471 forms both direct and water mediated H-bonds with base sites on its anticodon nucleotide, G18 on the sense strand within this pentamer (see tables 1 and 2 and figure 5c). Furthermore, the 5'-AACAG-3' sequence overlaps the 5'-AAGAA-3' palindrome located in the GRE right major groove halfsite which is rich in codon nucleotides for the exon 3 encoded DNA recognition helix amino acids. Amino acids of the DNA recognition helix form specific atomic interactions at their cognate codon/anticodon nucleotide base sites within the 5'-AAGAA-3' sequence, as described above, see Tables 1 and 2 and figures 4a-e and 5a and b. Finally, as the molecular dynamics proceed, amino acids of the predicted alpha helix encoded at the exon 4 and 5 splice junction site as described above, see figures 1 and 2, become aligned with and form H- bonds at the DNA backbone of their cognate codon/anticodon nucleotides within the GRE flanking nucleotide regions of the right major groove halfsite, see table 2 and figure 5d. It is remarkable that the DNA binding amino acids of the GR DBD are located predominantly within structures encoded at the splice junction sites of the three exons, 3, 4 and 5, which encode the GR DBD. Therefore, we propose that prebiotic, template directed autocatalytic synthesis of mutually cognate peptides and nucleotides resulted in their amplification and evolutionary conservation in a contemporary eukaryotic organism as a modular genetic regulatory apparatus.
Our findings are consistent with our hypothesis that the underlying mechanism of a common site specific DNA recognition code for DNA regulatory proteins is based on stereochemical complementarity between the functional sites on amino acid sidechains and base sites on their cognate codon/anticodon nucleotides. The data do not, however, explicitly support the anticodonic model for the origin of the genetic code as argued by Lacey et al. (42-44) nor the codonic model as suggested by the data of Yarus et al. (45-46). Our results indicate that genetic sequence analysis, secondary structural prediction and molecular model building, in accordance with our hypothesis can be used as a predictive tool for determining specific sites on DNA regulatory proteins which recognize cognate DNA binding sites (5,11). Our findings clearly illustrate the utility of molecular dynamics simulations as a tool for studying, in a stepwise manner, the complicated molecular events which occur during site specific DNA recognition by a DNA regulatory protein. In addition, our findings from molecular dynamics simulations comparing a fully solvated 80 Angstrom water droplet model to a 10 Angstrom water layer model of the GR DBD/GRE are in agreement, indicating that the results are insensitive to the hydration shell. Both direct and water mediated amino acid-nucleotide H-bonding interactions as shown in tables 1 and 2 are in most instances identical.
Since our molecular dynamics simulations were subnanosecond in duration, (300 picoseconds), it can be argued that not enough conformational space was explored between the GR DBD and GRE to detect alternative binding arrays. However, we believe our model and findings are credible since the amino acid/nucleotide atomic interactions for amino acids Lys 461, Val 462, and Arg 466 of the GR DNA recognition helix and nucleotides within the 5'-TGTTCT-3' -5'- AGAACA-3' recognition motif of the GRE right major groove halfsite agree with those reported for the GR DBD/GRE co-crystal complex (12). Furthermore, the observations described herein are in agreement with published biological interactions for the GR DBD with the GRE determined from laboratory bench "wetwork" findings of others (4,9,12-14,17-19,28,30-33,37, 48). Therefore, our GR DBD/GRE model is consistant with existing data. Finally, the findings reported herein strongly support our original prediction that conservation of genetic information is a determinate of site specific DNA recognition for DNA regulatory proteins (5, 10- 11).
ACKNOWLEDGEMENTS
We thank Don Gregory of Molecular Simulations Inc. for providing geometry for explicit sodium counter-ions used in all simulations and for Zn atom placement and charge parameters for Zn binding cysteines in the "zinc fingers" of the GR DBD structures. We also thank the Molecular Simulations Inc. staff for software support with QUANTA and discussion with CHARMm, Michael Fenton of Cray Research Inc. for data reduction programs, Barry Bolding of Cray Research Inc. for CHARMm software optimization on the CRAY C-90, Minnesota Supercomputer Institute Scientific Director, Don Truhlar for support and encouragement, the Minnesota Supercomputer Center user services representatives for technical support on the CRAY- 2 and C-90, R. Kaptein for personal communication of GR NMR structural coordinates, R. Lavery for providing CURVES 4.1 software and special thanks are due to Charlie Larson of Silicon Graphics Inc. for hardware support with the IRIS 4D 320-GTX workstation. We are sincerely grateful to Professor Thomas C. Spelsberg of the Department of Biochemistry and Molecular Biology, Mayo Foundation, Rochester, MN for preliminary review of the manuscript and encouragement. Finally, we thank Professor James C. Lacey Jr. of the Department of Biochemistry, The University of Alabama at Birmingham, who gave encouragement and valuable suggestions. This work was supported in part by a research grant from the Minnesota Supercomputer Institute, Minneapolis MN. This work was also supported by a research fellowship dedicated to the memory of William Lang JR.
REFERENCES