
Introduction
Conventional NMR protein structures are derived from large numbers of shortrange
interHydrogen distances extracted from peak heights in NOE spectra. Since quantitative
interpretation of NOE data requires prior knowledge of both the protein structure and its
overall dynamics, NOE distances are generally used in a qualitative way during a typical
structure determination (i.e. classes of shortdistance, medium distance, long distance).
These shortrange distances are commonly supplemented by Jcoupling values, which are
interactions taking place through chemical bonds. Since these interactions generally span
only one, two, or three chemical bonds, they are also shortrange in nature. Jcouplings
can be measured with great precision. However, they are interpreted on the basis of
empirical calibration curves (Karplus curves) which relate Jcoupling values to molecular
torsion angles. So, these relationships are often approximate, and they are commonly
ambiguous, which is to say that a given observed Jcoupling might be consistent with two
very different torsion angles.
If we consider the complexity of this shortrange information, even for a modest protein
fold like the 76residue ubiquitin, the difficulty of conventional NMR structure
determination becomes apparent. Without counting stereospecific interactions, or
interactions within residues, the Hatoms in ubiquitin comprise a network of more than 1400 interactions
where nuclei are within 5 angstroms. It is this complicated network which must be
characterized in some way in order to conduct a conventional NMR structure
determination.
So, in order to make NMR structure calculation simpler, we would like to find ways
which do not require analysis of such a complex network of interactions. And, in order to
make NMR structure calculation more precise, we would like to rely primarily on
parameters which can be interpreted quantitatively.
Chemical Shifts
In the first stages of NMR structure determination, we commonly assign the chemical
shifts of the backbone atoms. These chemical shifts are strongly correlated with residue
type, as shown in the plot of CAlpha vs CBeta chemical shift values, colorized by amino acid
type. At one extreme lies Ala residues, at the other, Ser and Thr. We can roughly
compensate for residuetype differences by subtracting residuespecific
random coil shift values, to generate secondary chemical shifts. An example is shown in this plot of CAlpha vs CBeta secondary chemical shift values colorized by residue type. As indicated in the
figure, the secondary shift distribution is roughly similar for all residue types, and is not
random.
As a clue to the information contained in secondary chemical shifts, consider the plot of CAlpha vs CBeta secondary shift for ubiquitin residues, colorized by structural motif.
As shown, secondary shifts from helical residues tend to have values which are different
from secondary shifts of residues in betasheets. In other words, the backbone secondary chemical shifts contain information about the backbone structure.
TALOSN: Prediction of Backbone and Sidechain Angles from Chemical Shifts
In an attempt to exploit this secondary shift information quantitatively, we originally used a simple
database mining approach, implemented in the TALOS system. In this system, we have a
database of known highresolution structures and their measured chemical shifts. Then,
given secondary chemical shifts of a triplet of residues in an unknown protein, we can
search the database for triplets which have similar secondary shifts. If we find several
good matches in the database, we can assume that the backbone angles of the central
residues in a database triplet will be good predictors for the phi and psi angles in the
unknown protein. In practice we assemble a list of several of the best matches from the
database, which originally contained only 20 highresolution protein structures.
Crossvalidation can be used to characterize TALOS by testing how
well each given known protein could be analyzed based on the remaining proteins in the
database. For most residues, there is a clear consensus of phi and psi values
in the best database matches, and in these "good" cases, the average and standard
deviations of the phi and psi angles from the database are used as quantitative predictors
for the backbone angles in the target protein. In the remaining cases, there is no
consensus on phi and psi angles from the closest database matches; these "ambiguous" cases are not used for prediction purposes.
Since the initial development of TALOS, improved versions include a larger database of protein structures,
and the database search is augmented by combining search results with predictions from an Artificial Neural Network.
The latest version, TALOSN can also make predictions
about sidechain orientation. TALOSN provides phi and psi estimates to better than 15 degrees RMS, although for
about 3.5% of the TALOS predictions, the predicted angles are substantially different from
the angles found in the reference Xray structures. However, a substantial fraction of this 3.5% appears to reflect
genuine differences relative to the crystalline state, and the true error rate therefore is believed to be considerably lower.
TALOSN can be used via a webbased server:
spin.niddk.nih.gov/bax/nmrserver/talosn.
SPARTA+: Prediction of Backbone Chemical Shifts from Protein Structure
Given the information in the TALOS database, it is also possible to estimate backbone
chemical shifts for a proposed structure. The simplest approach uses the database
information to create Ramachandran surfaces of secondary chemical shift distribution with respect to phi and psi. Chemical shifts for a specific phi/psi value can be found
simply by extracting the secondary shift from the given phi/psi point in the surface, and
then adding a suitable random coil value. This simple approach, based on phi/psi values
for individual residues, predicts backbone shifts with accuracies CAlpha: 1.12ppm, CBeta: 1.20ppm,
C': 1.29ppm, N: 3.10ppm, HN: 0.67ppm, and HAlpha: 0.36ppm.
Even better predictions can be performed via the Artifical Neural Network approach used by TALOSN, as
implemented in the SPARTA+ program. SPARTA+
predicts backbone shifts with accuracies Calpha: 0.92ppm, Cbeta: 1.13ppm, C': 1.07ppm, N: 2.45ppm,
HN: 0.49ppm, and Halpha: 0.25ppm.
SPARTA+ can also be used via a webbased server: spin.niddk.nih.gov/bax/nmrserver/sparta.
Dipolar Couplings
In an isotropically tumbling molecule, dipoledipole interactions are averaged to zero.
But, if the molecule is in the presence of an aligned medium such as a liquid crystal, the
molecule will interact with the aligned medium, and will no longer tumble isotropically.
Then, dipoledipole interactions will no longer be averaged to zero, resulting in a dipolar
coupling. These dipolar couplings can generally be measured by the same methods used
to find Jcouplings. And, the mathematical form for dipolar couplings can be described
exactly for a rigid molecule, as follows.
For the purpose of deriving the resonance frequencies (i.e., splittings produced by dipolar coupling) only the z component of the local field of one nuclear dipole at the
position of the second nucleus is relevant (secular approximation). So, for spins of atoms A and B:
Dipolar Splitting H_{dd} = D^{AB}_{max}< I_{Az} I_{Bz}
(3cos^{2}q1) >
where
the <> brackets refer to the time or ensemble average, which are
equivalent for isotropic and aligned solution,
q is the angle between the AB internuclear vector and the magnetic
field and:
D^{AB}_{max}= m_{o}(h/2π)g_{A}g_{B}/(4π^{2}r_{AB}^{3})


is the dipolar interaction value, which is the dipolar coupling that would be
observed for the static (completely aligned) molecule.
The constant, m_{o},
is the magnetic permittivity of vacuum, h is Planck's constant,
g_{X} is the magnetogyric ratio of
spin X, and r_{AB} is the distance between nuclei A and B.
Some common dipolar interaction values
for protein dipolar couplings are given in the following table. Note that these static couplings are very large; this means
that in practice, a weak alignment of about one part in 10^{3} or 10^{4} will suffice to give rise to
a measurable coupling:
Atom Pair A  B  Dipolar Interaction Value D^{AB}_{max}  
HN  N  21,585 Hz  
HN  C'  6,666 Hz  
C'  N  2,609 Hz  
HA  CA  44,539 Hz  
C'  CA  4,285 Hz  
CA  CB  4,150 Hz  


The residual
dipolar splitting between spins A and B equals:
D^{AB} = D^{AB}_{max}
< P_{2}(cosq)> with P_{2}(x) =
^{1}/_{2}(3x^{2}  1).
If the molecule is rigid, the orientation
of the internuclear vector, r_{AB},
in an arbitrary molecular coordinate system can be described by the angles a_{x}, a_{y},
and a_{z} between the vector
and the x, y, and z axis of the coordinate system. The angles b_{x},
b_{y}, and b_{z} define the instantaneous
orientations of each of these axes relative to the static magnetic field. With cosq
being the scalar product between a unit vector in the internuclear direction
and a unit vector parallel to B_{o}, P_{2}(cosq) can be rewritten as:
<P_{2}(cosq)> = ^{3}/_{2} < (cosb_{x}cosa_{x} + cosb_{y}cosa_{y }+ cosb_{z} cosa_{z})^{2}_{
}>  ^{1}/_{2}
With C_{i} = cosb_{i} and c_{i} = cosa_{i}, this can be rewritten as:
<P_{2}(cosq)>
=
^{3}/_{2 }[ <C_{x}>^{2}c_{x}^{2}
+ <C_{y}>^{2}c_{y}^{2 }+ <C_{z}>^{2}c_{z}^{2
}+ 2<C_{x} C_{y}>c_{x}c_{y} + 2<C_{x}
C_{z}>c_{x}c_{z} + 2<C_{y} C_{z}>c_{y}c_{z}
]  ^{1}/_{2}
By writing S_{ij} = ^{3}/_{2 }<C_{i} C_{j}>

^{1}/_{2 }d_{ij},
where d_{ij} is the Kronecker
delta function, we obtain:
<P_{2}(cosq)> =
S_{i,j={x,y,z}}
S_{ij} cosa_{i}
cosa_{j}
The 3x3 matrix S is commonly referred to as the Saupe
matrix, the Saupe order matrix, or simply the order matrix. As <C_{x}>^{2}_{ }+
<C_{y}>^{2} + <C_{z}>^{2}
= 1, the matrix S is traceless, and
with <C_{i} C_{j}>
= <C_{j} C_{i}>,
S is also symmetric, and therefore
only contains five independent elements.
If the structure of the molecule
is known, then the cosa_{i} direction cosine factors can be computed from the atomic coordinates of spins A and B.
This is an important result, because it means that
the five independent elements of the saupe matrix can generally be
solved by linear least squares methods, provided
that dipolar couplings for at least five internuclear vectors are
available. However, if any pair of
internuclear vectors is parallel, and for other special cases such as a set
that includes three mutually orthogonal interactions, more measured couplings
are required. For macromolecules, many
more dipolar couplings are frequently measured, and S is overdetermined.
Its elements are then commonly determined using singular value decomposition.
If the cartesian coordinates of spin A and spin B are {x_{A}, y_{A}, z_{A}} and {x_{B}, y_{B}, z_{B}} we can
define the direction cosines in terms of the coordinates as:
cosa_{x} = X^{AB} = (x_{A}  x_{B})/r_{AB}
cosa_{y} = Y^{AB} = (y_{A}  y_{B})/r_{AB}
cosa_{z} = Z^{AB} = (z_{A}  z_{B})/r_{AB}
Then, the five coefficients s_{1} ... s_{5} can be
determined by SVD using the following basis set:
q_{1} =
^{1}/_{2}
s^{1}
D^{AB}_{max} (3Z^{AB}Z^{AB}  1)
q_{2} =
^{1}/_{2}
s^{1}
D^{AB}_{max}
(X^{AB}X^{AB}  Y^{AB}Y^{AB})

q_{3} =
2 s^{1}
D^{AB}_{max} X^{AB}Y^{AB}
q_{4} =
2 s^{1}
D^{AB}_{max} X^{AB}Z^{AB}
q_{5} =
2 s^{1}
D^{AB}_{max} Y^{AB}Z^{AB}

where s is the estimated uncertainty
in the measured coupling. The measured dipolar couplings D^{AB}
are used to build a set of equations for solution by SVD:
s^{1}
D^{AB} = s_{1}q_{1} + s_{2}q_{2} + s_{3}q_{3} + s_{4}q_{4} + s_{5}q_{5}
Given SVD solutions for coefficients s_{1} ... s_{5},
the elements of the order matrix S are:
S_{xx} = ^{1}/_{2}(s_{1}  s_{2})

S_{xy} = S_{yx} = s_{3}

S_{yy} = ^{1}/_{2}(s_{1} + s_{2})

S_{xz} = S_{zx} = s_{4}

S_{zz} = s_{1}

S_{yz} = S_{zy} = s_{5}

The order matrix is real and
symmetric, and it therefore is always possible to define a molecular axis
system where S becomes diagonal. In a number of applications it can
be advantageous to work in this principal axis frame, where:
D^{AB}(a_{x},
a_{y}, a_{z}) = ^{3}/_{2
}D^{AB}_{max} {[
<C_{x}>^{2}c_{x}^{2} + <C_{y}>^{2}c_{y}^{2
}+ <C_{z}>^{2}c_{z}^{2}]
 1}_{}
where
<C_{i}>^{2}_{ }corresponds to the probability of
finding the ith axis parallel to the
magnetic field. Only the relative
differences in the <C_{i}>^{2} values contribute to the
residual dipolar coupling. So, writing
<C_{i}>^{2} = ^{1}/_{3} + A_{ii }, the
coupling can be expressed in polar coordinates (q
= a_{z}; c_{z} = cosq; c_{x} = sinq_{ }cosf; c_{y}
= sinq_{ }sinf) to yield:
D^{AB}(q,f) = ^{3}/_{2 }D^{AB}_{max} [cos^{2}q A_{zz} + sin^{2}q cos^{2}f A_{xx} + sin^{2}q sin^{2}f A_{yy}]
Defining A_{zz} > A_{yy}
> A_{xx}, and using A_{yy} + A_{xx} = A_{zz}; 2sin^{2}f = 1 
cos2f; and 2cos^{2}f = 1 + cos2f,
this can be rewritten as:
D^{AB}(
q,f)
= ^{3}/_{2 }D^{AB}_{max}
[P_{2}(cosq) A_{zz}
+ ^{1}/_{2}sin^{2}q
cos2f (A_{xx}  A_{yy})]
This leads to an expression of dipolar couplings in terms of alignment tensor parameters. From a graphical point of view,
the alignment tensor can be visualized in terms of a 3D ellipsoid whose orientation corresponds with the axes of alignment, with the dimensions of
the ellipsoid along each axis being A_{zz}, A_{yy}, and A_{xx}. As noted above, by definition A_{zz} is the longest axis,
A_{yy} the next longest, and A_{xx} the smallest.
Defining an axial component of
the alignment tensor A_{a} = ^{3}/_{2}A_{zz},
and a rhombic component, A_{r} = (A_{xx}  A_{yy}), results in:
D^{AB}(q,f)
= D^{AB}_{max} [P_{2}(cosq) A_{a} + ^{3}/_{4}
A_{r} sin^{2}q cos2f]
Note that the maximum value for
<C_{i}>^{2}_{ }is one, i.e., the maximum for A_{zz}
equals 2/3, and the maximum value for A_{a}
becomes one when the z axis of the
principal alignment tensor becomes fully aligned with the static field.
The above expression is often rewritten as:
D^{AB}(q,f)
= D^{AB}_{a} [(3cos^{2}q  1) + ^{3}/_{2} R sin^{2}q cos2f]
where D^{AB}_{a} = ^{1}/_{2}D^{AB}_{max} is referred
to as the magnitude of the dipolar coupling tensor, which describes how strongly aligned the molecular system is, and R
= A_{r}/A_{a} is the rhombicity, which is the departure of alignment from axial symmetry.
When the molecular system has been rotated so that its coordinate axes
correspond with the alignment tensor axes, then the dipolar coupling can
be computed as follows (this is the form used for our fitting of
dipolar couplings by nonlinear least squares, for cases where there
are restraints on one or more tensor parameters):
D^{AB}
= D^{AB}_{max} [D_{axial} (3Z^{AB}Z^{AB}  1) + ^{3}/_{2} D_{rhomb} (X^{AB}X^{AB}  Y^{AB}Y^{AB})]
A critical aspect of the dipolar
couplings is their dependence on cos^{2}q,
which in practice means that there are two continuous ranges of
orientation for the internuclear coupling vector which are consistent with a
given coupling value, and they are mirror images of each other. A simple way to reduce this ambiguity is to
prepare two different types of aligned media, for example a neutral one,
and a charged one. The nature of interaction between the target molecular
and the alignment media will be different in the two cases, resulting in
two different and independent alignment systems.
This restricts the orientations to only those positions which are consistent with both alignment tensors simultaneously. Then, only the intersecting orientations will be consistent with
coupling values from both samples.
However, there will still generally be cases that more than one conformation
is consistent with the dipolar data.
Within the context of a protein, residues
are arranged in preferred orientations relative to each other, and in most
cases, only one of these will be consistent with the collection of dipolar
couplings. So, one way to reduce the impact of ambiguity is to only consider
physically realistic protein conformations.
To help resolve potential ambiguity still further, we can employ secondary
structure information from chemical shifts.
The molecular alignment frame serves as a reference system which establishes
the relative orientation of one internuclear coupling vector with respect to
any other, regardless of how far apart these internuclear vectors may be. For example, in the case of
HNN couplings, dipolar couplings tell about the orientation of an HNN bond vector
relative to any other HNN bond vector. This longrange orientational information is
very different in nature from the shortrange distance and torsion information
traditionally used in NMR structure calculation, and as we will show, it is a powerful
complement to shortrange data.
Direct Applications of Dipolar Couplings
As already explained, dipolar coupling values are determined by orientation. So, dipolar
couplings for two parallel bond vectors should be identical, given scaling factors to
compensate for bond distance and magnetogyric ratios. This can form the basis for some
useful analysis. For example, side chain orientation can be estimated by testing how
dipolar couplings in the sidechain are correlated to parallel bonds in the backbone.
Similarly, relative stereochemistry for molecules with several stereochemical centers can
be identified by testing agreement of observed vs calculated dipolar couplings
for all possible stereoisomers, and finding the stereoisomer with the best agreement
between observed and calculated dipolar couplings.
Dipolar couplings can be extremely sensitive to small changes in molecular coordinates, and they can be used directly in conventional structure determination.
For example, consider the initial structure of a 69residue protein fragment which has a
~7 Hz RMSD between observed and calculated HNN dipolar couplings. This structure
can be refined by conventional simulated annealing so that the couplings match to better than 1 Hz RMSD, but the refined backbone
structure differs by less than 0.3 angstroms RMSD from the initial structure. As such,
dipolar couplings can reveal structural details which might be difficult to characterize
with NOE distances alone, for example the curvature of an isolated helix, as in the case of
micellebound alphasynuclein.
Structure Determination from Dipolar Couplings as the Primary Data
As noted above, a critical aspect of the dipolar couplings is their dependence on
cos^{2}q which in practice means that there are two continuous ranges of orientation for the
internuclear coupling vector which are consistent with a given coupling value, and they
are mirror images of each other. As also noted, a simple way to reduce this ambiguity is
to introduce couplings measured at another alignment tensor, which restricts the
orientations to only those positions which are consistent with both alignment tensors
simultaneously. Then, only the intersecting orientations will be consistent with coupling
values from both samples. This greatly reduces the ambiguity of dipolar couplings, but does not eliminate it.
It can be noted that bond vectors in a protein are not oriented randomly or uniformly. For
example HNN bond vector orientation surfaces for ubiquitin and DinI proteins show that
the distributions are systematic, and also different for the two proteins. This argues that when analyzing data for a sequence of many residues simultaneously,
it is not strictly necessary to consider every possible orientation of individual bond vectors when
exploiting dipolar couplings for protein structure determination.
Since protein residues are arranged in preferred orientations relative to
each other, one way to reduce the impact of ambiguity is to
consider only physically realistic protein conformations.
And, to help resolve potential ambiguity still further, we can employ
structure information from other NMR observables such as chemical shifts
in combination with dipolar coupling data. This leads to the database
mining approach called Molecular Fragment Replacement (MFR).
Molecular Fragment Replacement (MFR)
The central concept of MFR is straightforward; identify short fragments of
known highresolution protein structures whose simulated NMR
parameters are a good match for the observed NMR parameters of
the target protein.
Then, use suitable methods to assemble these fragments into
larger elements of protein structure.
The NMR parameters can include any combination of chemical shifts,
dipolar couplings, Jcouplings, sequential NOEs, etc, as well
as residuetype homology. For each individual parameter, such as chemical
shift, a score is computed on the basis of the RMS differences
between observed and predicted values. A linear combination of these
individual scores is used to form an overall MFR score which
is used to rank the fragments.
In practice, MFR uses fragment sizes
of 510 residues, currently drawn from a subset of roughly 850 structures
in the
PDB database, all with resolutions better than 2.4 angstroms. This provides
a collection of more than 180,000 fragments, which is large enough
to ensure that all physically realistic short fragment structures are
represented. This means that MFR database mining can be still
be applied to novel proteins which have no known homologous structure
in the PDB.
Our first proofofconcept result is shown in
the MFR search results for
residues 116 of ubiquitin, where we tallied the three best
matching fragments in terms of simulated versus measured chemical
shifts and dipolar couplings. Dipolar couplings for HNN,
HNC', NC', and HACA measured in two alignment media were employed.
Using this data in an MFR search, the three database fragments with the
best MFR scores all match the backbone structure of ubiquitin to better
than 1 angstrom RMSD.
In a typical MFR search, we find overlapping collections of fragments
which are best matches (i.e. lowest scores) according to the MFR scoring
procedure.
So for example, using a 7residue fragment size, an MFR search will identify
the 10 database fragments which are the best match
for residues 17 in the target protein, the 10 best matches for residues
28 in the target protein, etc.
Such a collection of fragments can be visualized as a Ramachandran trajectory where each fragment is represented
as a collection of vectors connecting that fragment's phi,psi backbone angles
on a series of ramachandran surfaces for the target residue sequence.
The ramachandran trajectory of MFR
database fragments for ubiquitin is a good illustration of typical MFR results
where chemical shifts are used in combination with dipolar couplings from
two aligned media. In such results, there are two notable aspects.
First, for the majority of residues where measured NMR parameters are
available, MFR provides unambiguous indication
of the phi,psi conformation. Second, even in regions where the MFR results
show structural diversity, there are generally always some fragments which
are good representatives of the ideal structure. This argues that the
MFR search results should be a powerful and effective precursor to
structure determination, given suitable protocols for converting MFR
results into complete structures.
However, it must be noted that it is not easy to use phi,psi conformations
directly to build an entire protein structure. For example,
if the exact phi,psi backbone angles from the crystal structure of ubiquitin
are superimposed onto an ideal planar protein backbone geometry,
the new structure differs from the original by more than 4 angstroms RMS.
Nevertheless, it is possible to use MFR phi,psi values directly to build
and refine elements of structure from 10 to 50 amino acids long, without
the use of any distance restraints. In the ideal case of ubiquitin,
where 4 types of highquality dipolar couplings are available in two media
for almost all residues, the entire protein fold can be determined from dipolar couplings and chemical
shifts alone, to a backbone RMS of better than 1 angstrom.
MFR Alignment Tensor Estimates without Prior Knowledge of Structure
As noted above, most of the fragments from an MFR search are good
representatives of the ideal structure of the target.
This means that the tensor magnitude and rhombicity estimates from the best MFR fragments should be good estimates for the alignment tensor parameters of the entire intact protein. These estimates
can be used in later structure refinement steps.
The predicted tensor parameters for the protein as a whole are
calculated from a weighted average of the tensor parameters from
the entire collection of MFR fragments.
The weighting is performed according to the structural consensus
over each given range of residues.
This method can estimate
tensor magnitudes and rhombicities to 0.5 Hz relative to an HNN coupling.
In the case where two alignment media are available, the MFR results can
likewise be used to compute the relative difference in orientation
between the two tensors, which can also be used as a restraint for later
structure refinement.
Dynamics Information from MFR
In the case of a flexible backbone structure, the local tensor magnitude is
scaled down by the internal dynamics order parameter. So, a simple plot of
MFR fragment tensor magnitude estimates versus fragment starting residue will reveal the location of flexible regions
as places in the graph where the tensor magnitude drops.
This same approach can also be used to identify cases where domains
within a protein have different alignment tensors.
GammaS Crystallin: A Practical Application of MFR
The MFR application of GammaS crystallin serves as
an example of practical, highquality structure determination based primarily
on orientational restraints, supplemented by relatively small numbers
of easytoassign NOE distances.
GammaS is 177 residue protein with two similar domains, for which
a homologous structure (GammaB) with 50% sequence identity is known.
Dipolar coupling data in two media were measured. In one medium,
measurements included 144 HNN, 111 CACB, 150 CAC', and 134 NC'
dipolar couplings. In the second medium, measurements included
147 HNN, 135 CACB, 153 CAC', and 139 NC' dipolar couplings.
Conformational exchange resulted in missing amide signals
for one residue in the Nterminal domain, and nine residues in
the Cterminal domain, so that most of the "missing" coupling data
is associated with the Cterminal domain.
Sidechain c1 angles were
estimated from from
^{3}JNCg and
^{3}JC'Cg couplings, and
c2 angles from ^{3}JCgCd.
A deuterated sample was used to obtain 179 AmideAmide NOEs, however none
of these represented interdomain contacts. So, a GammaS sample with 13C
labeling for methyl sidechains only was used to obtain 70 MethylMethyl NOEs,
and these included 6 interdomain distances.
Using this data, the MFR search was
conducted in two stages followed by a fragment refinement step.
In the first MFR search, dipolar couplings were fit to database
fragments using SVD, which is the default approach, and computationally
fast. In this case, the tensor magnitudes and rhombicities are allowed
to assume any value.
So, in some cases, fragments with a "nonideal" shape can be made to
match the measured dipolar couplings by using tensor parameters
which are not truly representative of the target protein.
To improve this situation, the results of the first MFR search are
using to estimate reasonable tensor magnitude and rhombicity for the
two alignment media. Then, a second MFR search is performed, this
time with the tensor parameters held fixed at the estimated values.
This requires use of nonlinear leastsquares fitting of the dipolar
couplings, which is much slower. However, this second search with
restrained tensor parameters leads to a collection of fragments with
fewer ambiguities. Finally, the fragments identified by this second
MFR search are subject to conventional lowtemperature simulated
annealing refinement, to make small adjustments in the fragments which
improve their overall agreement with the measured dipolar couplings.
In the case of GammaS, the resultant collection of refined fragments
has unambiguous phi,psi conformations for 90% of the residues.
Furthermore, these fragments have a high amount of structural consensus,
such that 50% of the residues have better than 5 degree RMS phi,psi
consensus, and 33% of the residues have better than 3 degree RMS
phi,psi consensus.
At this point, we have a collection of structural data which is
more or less ideal for 90% of the residues, but we don't know
in advance which residues are the "ideal" ones. We can use a simple
modification of a traditional annealing scheme to employ this data.
First, all residues the MFR results are converted into phi,psi restraints
for all residues where there is a consensus phi,psi conformation
in all refined fragments. Then, these restraints are used in a conventional
simulated annealing protocol,
along with NOE distances, and the original
dipolar couplings themselves. In the hightemperature phase,
the force constants for the MFRderived torsion restraints are held
high, so that the MFR results maintain the local structure at
early stages of the structure calculation.
During cooling, the force constant of the MFR torsions is reduced,
while the force
constant for the NOEs and individual dipolar couplings is increased.
So, as the structure approaches its ideal fold, the MFR torsion
restraints become less important, and the individual dipolar
couplings become more important in maintaining the local structure.
This allows the final structure a chance to overcome any incorrect
conformations in the MFR torsion restraints. In this case, the final MFRderived structure of GammaS agrees with its homolog GammaB to
0.63 angstroms RMS for the Nterminal domain backbone, and 1.09
angstroms for the Cterminal domain. In particular, the Nterminal
agreement is among the best between any NMR structure and homolog.
Summary
Dipolar couplings and MFR database mining form the basis for a
new approach to structure determination based primarily on
quantitative orientational restraints, supplemented by
small numbers of NOE distances. The MFR approach can also be used
to estimate tensor parameters without prior knowledge of the
structure, and to probe the dynamics of the molecular system.
In our applications so far, the MFR approach has been quicker
than conventional NMR structure
determination, and has yielded better quality structures.
It has also provided structural information for systems where NOE
data is not obtainable or is not revealing.
